Method of identifying disease risk factors

ABSTRACT

Provided herein is a method for identifying a genetic variant that is associated with development of a condition of interest (e.g., Alzheimer&#39;s disease), and genetic variants so identified. Methods of treatment with an active agent (e.g., with a particular active agent and/or at an earlier age) is also provided, upon detecting a genetic variant described herein. In some embodiments, the genetic variant is a deletion/insertion polymorphism (DIP) of the TOMM40 gene. Kits for determining if a subject is at increased risk of developing late onset Alzheimer&#39;s disease is also provided. Kits for determining if a subject is responsive to treatment for a condition of interest with an active agent are further provided.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/021,261, filed Jun. 28, 2018, now allowed, which is a continuation ofU.S. patent application Ser. No. 14/332,867, filed Jul. 16, 2014, nowabandoned, which is a continuation of U.S. patent application Ser. No.13/058,724, filed Apr. 14, 2011, now U.S. Pat. No. 8,815,508, which is anational stage of International Patent Application No.PCT/US2009/053373, filed Aug. 11, 2009, which claims the benefit of U.S.Provisional Application No. 61/088,203, filed Aug. 12, 2008, U.S.Provisional Application No. 61/186,673, filed Jun. 12, 2009, and U.S.Provisional Application No. 61/224,647, filed Jul. 10, 2009, thedisclosures of each of which is incorporated by reference herein in itsentirety.

STATEMENT REGARDING ELECTRONIC FILING OF A SEQUENCE LISTING

A Sequence Listing in ASCII text format, submitted under 37 C.F.R. §1.821, entitled 9719-2CT3_ST25.txt, 9,979 bytes in size, generated onNov. 10, 2020, and filed via EFS-Web, is provided in lieu of a papercopy. This Sequence Listing is incorporated by reference into thespecification for its disclosures.

FIELD OF THE INVENTION

The present invention relates to the field of genomics, genetics,pharmacogenetics, and bioinformatics, including genome analysis and thestudy of DNA sequence variation. The invention also relates to studiesof association between variations in DNA sequences and anticipation ofan individual's susceptibility to a particular disease, disorder, orcondition and/or response to a particular drug or treatment.

BACKGROUND OF THE INVENTION

The search for genetic markers associated with complex diseases isongoing. Genome-wide scanning studies with SNP arrays continue tohighlight the ApoE region as the most important area for investigationin the study of Alzheimer's disease (Coon et al., J. Clin. Psychiatry68: 613-8 (2007); Li et al., Arch. Neurol. 65: 45-53 (2007)).

The ApoE 4 isoform has previously been strongly associated withincreased risk of developing late-onset Alzheimer's disease.(Pericak-Vance et al., Am. J. Hum. Genet. 48, 1034-50 (1991); Martin etal., 2000, U.S. Pat. No. 6,027,896 to Roses, et al., U.S. Pat. No.5,716,828 to Roses et al.) The relationship is dose dependent (Yoshizawaet al., 1994; Schellenberg, 1995). That is to say, a carrier of two ApoE4 alleles is more likely to develop late-onset Alzheimer's disease(LOAD) than a carrier of only one ApoE 4 allele, and at an earlier age(Corder et al., Science 261, 921-3 (1993)).

Nevertheless, E4 alleles only account for roughly 50% of hereditaryAlzheimer's disease. One explanation is that ApoE 4 is merely serving asa surrogate marker for something in linkage disequilibrium nearby.Alternatively, considering the recent discovery of a mechanistic rolefor ApoE 4 in mitochondrial toxicity, the negative effects of ApoE 4 maybe abrogated or exacerbated by another gene product encoded nearby(Chang et al., 2005).

As ApoE status is also associated with risk for coronary artery diseaseand likely also a host of other diseases and disorders, the implicationsof the study of the ApoE region are not limited to Alzheimer's disease,but are potentially far-reaching (Mahley et al., Proc. Natl. Acad. Sci.USA 103: 5644-51 (2006)). More broadly, the examination of variantsequences for processes or pathways surrounding genes in linkagedisequilibrium with other genetic regions known to be involved incomplex disease processes will provide valuable information indeciphering the mechanisms of those diseases.

SUMMARY OF THE INVENTION

Provided herein is a method for identifying a genetic variant that isassociated with development of a condition of interest (e.g., earlier orlater onset of a disease of interest), comprising: (a) determining frombiological samples containing DNA the nucleotide sequences carried by aplurality of individual human subjects at a genetic locus of interest,wherein subjects include both (i) subjects affected with the conditionof interest and (ii) subjects unaffected with the condition of interest;(b) identifying genetic variants at said genetic locus from nucleotidesequences observed in said plurality of subjects (e.g., using a multiplesequence alignment analysis); (c) mapping said genetic variants byconstructing a phylogenetic tree of said nucleotide sequences of saidsubjects, said tree comprising branches that identify variant changesbetween said subjects (e.g., variant changes on the same cistron); (d)examining the genetic variants represented as branches in said tree anddetermining the ratio of affected and unaffected subjects to identifythose changes that lead to a changed ratio of affected to unaffectedsubjects (preferably wherein the starting point is the genetic variantrepresenting the greatest number of subjects); and then (e) identifyinga genetic variant or group of variants (a haplotype) where the ratio ofaffected to unaffected subjects is substantially different from one ormore adjacent variants on said tree (e.g., at least 5, 10, 15, 20, 25,30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, or 90% different) tothereby identify a genetic variant associated with the development ofsaid condition of interest.

In some embodiments, all subjects carry a same known polymorphism thatis associated with the condition of interest.

In some embodiments, the condition of interest is a neurodegenerativedisease, a metabolic disease (e.g., dyslipidemia), a cardiovasculardisease, a psychiatric disorder, or cancer. In some embodiments, thedisease of interest is a disease in which ApoE and/or TOMM40 areimplicated in disease pathogenesis.

In some embodiments, the condition of interest is associated withincreased or decreased mitochondrial dysfunction. In some embodiments,the condition of interest is schizophrenia. In some embodiments, thecondition of interest is coronary artery disease. In some embodiments,the condition of interest is diabetes mellitus, type II. In someembodiments, the condition of interest is Parkinson's disease. In someembodiments, the condition of interest is Alzheimer's disease.

In some embodiments, the known polymorphism risk factor is theApolipoprotein E allele (e.g., ApoE 2, ApoE 3 or ApoE 4).

In some embodiments, the genetic locus of interest is in linkagedisequilibrium with the known polymorphism. In some embodiments, thegenetic locus of interest is on the same chromosome and less than 10,20, 30, 40, or 50 kilobases away from the known polymorphism. In someembodiments, the genetic locus is TOMM40.

Also provided is a method of determining increased risk for developmentof a condition of interest, comprising: (a) determining from abiological sample containing DNA a genetic variant identified by themethod of any of the preceding paragraphs carried by an individualsubject; and then (b) determining the subject is at increased risk fordevelopment of the condition of interest when the genetic variant ispresent.

Further provided is a method of determining increased risk fordevelopment of Alzheimer's disease in a subject (e.g., a subjectcarrying at least one Apo E3 allele), comprising: (a) detecting from abiological sample containing DNA taken from the subject the presence orabsence of a genetic variant of the TOMM40 gene associated withincreased or decreased risk of Alzheimer's disease; and (b) determiningthe subject is at increased or decreased risk of Alzheimer's diseasewhen the genetic variant is present or absent.

In some embodiments, it is determined whether the subject is an ApoE2/E2, E2/E3, E2/E4, E3/E3, E3/E4, or E4/E4 subject. In someembodiments, it is determined whether the subject is an Apo E3/E3 orE3/E4 subject.

In some embodiments, the method further includes the step of: (c)administering an anti-Alzheimer's disease active agent to the subject ina treatment effective amount when the subject is determined to be atincreased risk of Alzheimer's disease.

In some embodiments, the administering step is carried out in thesubject at an earlier age when the subject is determined to be atincreased risk by the presence or absence of the genetic variant ascompared to a subject in which the genetic variant is not present orabsent (e.g., for an ApoE 4/4 subject, beginning at age 45, 46, 47, 48,49, 50, 51, 52, or 53, and continuously through each year thereafter,rather than beginning at age 55 or more; for an ApoE 4/3 subject, at age50, 51, 52, 53, 54, 55, 56, 57, or 58, and continuously through eachyear thereafter, rather than beginning at age 60 or more; for an ApoE3/3 subject, at age 55, 56, 57, 58, 59, 60, 61, 62, or 63, andcontinuously through each year thereafter, rather than beginning at age65 or more; and for an ApoE 2/3 subject, at age 60, 61, 62, 63, 64, 65,66, 67, or 68, and continuously through each year thereafter, ratherthan beginning at age 70 or more).

In some embodiments, the active agent is selected from the groupconsisting of acetylcholinesterase inhibitors, NMDA receptorantagonists, PPAR agonists or modulators (e.g., drugs in thethiazolidinedione or glitazar classes), antibodies, fusion proteins,therapeutic RNA molecules, and combinations thereof. In someembodiments, the active agent is rosiglitazone or a pharmaceuticallyacceptable salt thereof.

In some embodiments, the genetic variant of the TOMM40 is a variantlisted in Table 1 as set forth below.

Also provided is a method of treating a subject (e.g., a subject havingat least one ApoE 3) allele for Alzheimer's disease by administering ananti-Alzheimer's disease active agent to the subject in atreatment-effective amount; the improvement comprising: administeringthe active agent to the subject at an earlier age when the subjectcarries a genetic variant of the TOMM40 gene associated with increasedrisk of Alzheimer's disease as compared to a corresponding subject whodoes not carry the genetic variant (e.g., for an ApoE 4/4 subject,beginning at age 45, 46, 47, 48, 49, 50, 51, 52, or 53, and continuouslythrough each year thereafter, rather than beginning at age 55 or more;for an ApoE 4/3 subject, at age 50, 51, 52, 53, 54, 55, 56, 57, or 58,and continuously through each year thereafter, rather than beginning atage 60 or more; for an ApoE 3/3 subject, at age 55, 56, 57, 58, 59, 60,61, 62, or 63, and continuously through each year thereafter, ratherthan beginning at age 65 or more; and for an ApoE 2/3 subject, at age60, 61, 62, 63, 64, 65, 66, 67, or 68, and continuously through eachyear thereafter, rather than beginning at age 70 or more).

In some embodiments, the subject is an Apo E2/E2, E2/E3, E2/E4, E3/E3,E3/E4, E4/E4 subject. In some embodiments, the subject is an Apo E3/E3or E3/E4 subject.

In some embodiments, the active agent is selected from the groupconsisting of acetylcholinesterase inhibitors, NMDA receptorantagonists, PPAR agonists or modulators (e.g., drugs in thethiazolidinedione or glitazar classes), antibodies, fusion proteins,therapeutic RNA molecules, and combinations thereof. In someembodiments, the active agent is rosiglitazone or a pharmaceuticallyacceptable salt thereof.

In some embodiments, the genetic variant of the TOMM40 gene is adeletion/insertion polymorphism (DIP). In some embodiments, the DIP isan insertion polymorphism. In some embodiments, the DIP is poly-Tdeletion/insertion polymorphism (e.g., between 5 and 100, or 10 and 80,or 20 and 50 bp poly-T).

In some embodiments, the genetic variant of the TOMM40 is a variantlisted in Table 1 as set forth below. In some embodiments, the DIP isrs10524523, rs10602329 or DIP3. In some embodiments, the DIP isrs10524523.

Further provided is a method of treatment for a condition of interest,wherein the condition of interest is associated with ApoE and/or TOMM40,for a patient in need thereof, the method including the steps: (a)determining the presence or absence of a genetic variant identified bythe method of paragraph 1-12 carried by an individual subject togenerate a genetic profile of the patient; and then, if the profile isindicative of the patient being responsive to an active agent, (b)administering the active agent to the subject in a treatment effectiveamount to treat the condition of interest.

In some embodiments, the active agent is selected from the groupconsisting of acetylcholinesterase inhibitors, NMDA receptorantagonists, PPAR agonists or modulators (e.g., drugs in thethiazolidinedione or glitazar classes), antibodies, fusion proteins,therapeutic RNA molecules, and combinations thereof. In someembodiments, the active agent is rosiglitazone or a pharmaceuticallyacceptable salt thereof.

In some embodiments, the genetic variant of the TOMM40 gene is adeletion/insertion polymorphism (DIP). In some embodiments, the DIP isan insertion polymorphism. In some embodiments, the DIP is poly-Tdeletion/insertion polymorphism (e.g., between 5 and 100, or 10 and 80,or 20 and 50 bp poly-T insertion).

In some embodiments, the genetic variant of the TOMM40 is a variant ofTOMM40 listed in Table 1 as set forth below. In some embodiments, theDIP is rs10524523, rs10602329 or DIP3. In some embodiments, the DIP isrs10524523.

Also provided is a method of treatment for Alzheimer's disease in asubject, including: (a) detecting from a biological sample containingDNA taken from the subject the presence or absence of a genetic variantof the TOMM40 gene associated with responsiveness to an active agent;and, if the genetic variant is present, (b) administering the activeagent to the subject in a treatment effective amount to treat theAlzheimer's disease.

In some embodiments, the subject carries at least one ApoE 3 allele. Insome embodiments, the subject is an Apo E3/E3 or E3/E4 subject.

In some embodiments, the active agent is selected from the groupconsisting of acetylcholinesterase inhibitors, NMDA receptorantagonists, PPAR agonists or modulators (e.g., drugs in thethiazolidinedione or glitazar classes), antibodies, fusion proteins,therapeutic RNA molecules, and combinations thereof. In someembodiments, the active agent is rosiglitazone or a pharmaceuticallyacceptable salt thereof.

In some embodiments, the genetic variant of the TOMM40 gene is adeletion/insertion polymorphism (DIP). In some embodiments, the DIP isan insertion polymorphism. In some embodiments, the DIP is poly-Tdeletion/insertion polymorphism (e.g., between 5 and 100, or 10 and 80,or 20 and 50 bp poly-T).

In some embodiments, the genetic variant of the TOMM40 gene is a variantlisted in Table 1 as set forth below. In some embodiments, the DIP isrs10524523, rs10602329 or DIP3. In some embodiments, the DIP isrs10524523.

Further provided is the use of an anti-Alzheimer's disease active agentfor the preparation of a medicament for carrying out a method oftreatment for Alzheimer's disease in accordance with the paragraphs setforth above. Also provided is the use of an anti-Alzheimer's diseaseactive agent for carrying out a method of treatment for Alzheimer'sdisease.

A method of determining a prognosis for a patient at risk for developingAlzheimer's disease is provided, including obtaining a patient profile,wherein the obtaining a patient profile includes: detecting the presenceor absence of at least one ApoE allele in a biological sample of thepatient, and detecting the presence or absence of at least one TOMM40deletion/insertion polymorphism (DIP) located in intron 6 or intron 9 ofthe TOMM40 gene, and then, converting the patient profile into theprognosis, wherein the presence of the ApoE allele and the presence ofthe at least one TOMM40 DIP polymorphism identifies the patient as apatient at risk for developing Alzheimer's disease.

In some embodiments, the DIP is an insertion polymorphism. In someembodiments, the DIP is poly-T deletion/insertion polymorphism (e.g.,between 5 and 100, or 10 and 80, or 20 and 50 poly-T).

In some embodiments, the DIP is rs10524523, rs10602329 or DIP3. In someembodiments, the DIP is rs10524523.

In some embodiments, the method further includes detecting whether thesubject is an Apo E2/E2, E2/E3, E2/E4, E3/E3, E3/E4, E4/E4 subject. Insome embodiments, the subject is an Apo E3/E3 or E3/E4 subject.

Also provided is a method for stratifying a subject into a subgroup of aclinical trial of a therapy for the treatment of Alzheimer's disease,the method including: detecting the presence or absence of at least oneApoE allele in a biological sample of the patient, and detecting thepresence or absence of at least one TOMM40 deletion/insertionpolymorphism (DIP) located in intron 6 or intron 9 of the TOMM40 gene,wherein the subject is stratified into the subgroup for the clinicaltrial of the therapy based upon the presence or absence of the at leastone ApoE and/or TOMM40 DIP allele.

In some embodiments, the DIP is an insertion polymorphism. In someembodiments, the DIP is poly-T insertion polymorphism (e.g., between 5and 100, or 10 and 80, or 20 and 50 poly-T insertion).

In some embodiments, the DIP is rs10524523, rs10602329 or DIP3. In someembodiments, the DIP is rs10524523.

In some embodiments, the method further includes detecting whether thesubject is an Apo E2/E2, E2/E3, E2/E4, E3/E3, E3/E4, E4/E4 subject. Insome embodiments, the subject is an Apo E3/E3 or E3/E4 subject.

Further provided is a method for identifying a patient in a clinicaltrial of a treatment for Alzheimer's disease including: a) identifying apatient diagnosed with Alzheimer's disease; and b) determining aprognosis for the patient diagnosed with Alzheimer's disease comprisingobtaining a patient profile, wherein the patient profile comprises i)detecting the presence or absence of at least one ApoE allele in abiological sample of the patient, ii) detecting the presence or absenceof at least one TOMM40 deletion/insertion polymorphism (DIP) located inintron 6 or intron 9 of the TOMM40 gene, and converting the patientprofile into the prognosis, the prognosis including a prediction ofwhether the patient is a candidate for the clinical trial for thetreatment of Alzheimer's disease.

In some embodiments, the DIP is an insertion polymorphism. In someembodiments, the DIP is poly-T deletion/insertion polymorphism (e.g.,between 5 and 100, or 10 and 80, or 20 and 50 poly-T).

In some embodiments, the DIP is rs10524523, rs10602329 or DIP3. In someembodiments, the DIP is rs10524523.

In some embodiments, the method further includes detecting whether thesubject is an Apo E2/E2, E2/E3, E2/E4, E3/E3, E3/E4, E4/E4 subject. Insome embodiments, the subject is an Apo E3/E3 or E3/E4 subject.

A kit for determining if a subject is at increased risk of developinglate onset Alzheimer's disease is provided, including: (A) at least onereagent that specifically detects ApoE 3, ApoE 4, or ApoE 2, wherein thereagent is selected from the group consisting of antibodies thatselectively bind ApoE 3, ApoE 4, or ApoE 2, and oligonucleotide probesthat selectively bind to DNA encoding the same; (B) at least one reagentthat specifically detects the presence or absence of at least one TOMM40deletion/insertion polymorphism (DIP) located in intron 6 or intron 9 ofthe TOMM40 gene; and (C) instructions for determining that the subjectis at increased risk of developing late onset Alzheimer's disease by:(i) detecting the presence or absence of an ApoE isoform in the subjectwith the at least one reagent; (ii) detecting the presence or absence ofat least one TOMM40 deletion/insertion polymorphism (DIP) located inintron 6 or intron 9 of the TOMM40 gene; and (iii) observing whether ornot the subject is at increased risk of developing late onsetAlzheimer's disease by observing if the presence of ApoE isoform and theTOMM40 DIP is or is not detected with the at least one reagent, whereinthe presence of the ApoE isoform and the TOMM40 DIP indicates thesubject is at increased risk of developing late onset Alzheimer'sdisease.

In some embodiments, the at least one reagent and the instructions arepackaged in a single container.

In some embodiments, the DIP is an insertion polymorphism. In someembodiments, the DIP is poly-T deletion/insertion polymorphism (e.g.,between 5 and 100, or 10 and 80, or 20 and 50 poly-T).

In some embodiments, the DIP is rs10524523, rs10602329 or DIP3. In someembodiments, the DIP is rs10524523.

In some embodiments, the determining step further includes detectingwhether the subject is an Apo E2/E2, E2/E3, E2/E4, E3/E3, E3/E4, orE4/E4 subject. In some embodiments, the subject is an Apo E3/E3 or E3/E4subject.

A kit is provided for determining if a subject is responsive totreatment for a condition of interest, wherein the condition of interestis associated with ApoE and/or TOMM40, with an active agent, the kitincluding: (A) at least one reagent that specifically detects ApoE 3,ApoE 4, or ApoE 2, wherein the reagent is selected from the groupconsisting of antibodies that selectively bind ApoE 3, ApoE 4, or ApoE2, and oligonucleotide probes that selectively bind to DNA encoding thesame; (B) at least one reagent that specifically detects the presence orabsence of at least one TOMM40 deletion/insertion polymorphism (DIP)located in intron 6 or intron 9 of the TOMM40 gene; and (C) instructionsfor determining that the subject is responsive to treatment for thecondition of interest with the active agent of interest by: (i)detecting the presence or absence of an ApoE isoform in the subject withthe at least one reagent; (ii) detecting the presence or absence of atleast one TOMM40 deletion/insertion polymorphism (DIP) located in intron6 or intron 9 of the TOMM40 gene; and (iii) determining whether or notthe subject is responsive to treatment by observing if the presence ofthe ApoE isoform and the TOMM40 DIP is or is not detected with the atleast one reagent, wherein the presence of ApoE 3 and the TOMM40 DIPindicates that the subject is responsive to the treatment with theactive agent.

In some embodiments, the at least one reagent and the instructions arepackaged in a single container.

In some embodiments, the DIP is an insertion polymorphism. In someembodiments, the DIP is poly-T deletion/insertion polymorphism (e.g.,between 5 and 100, or 10 and 80, or 20 and 50 bp poly-T).

In some embodiments, the DIP is rs10524523, rs10602329 or DIP3. In someembodiments, the DIP is rs10524523.

In some embodiments, the determining step further includes detectingwhether the subject is an Apo E2/E2, E2/E3, E2/E4, E3/E3, E3/E4, orE4/E4 subject. In some embodiments, the subject is an Apo E3/E3 or E3/E4subject.

It will be understood that all of the foregoing embodiments can becombined in any way and/or combination. The foregoing and other objectsand aspects of the present invention are explained in greater detail inthe drawings provided herewith and in the specification set forth below.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a general flowchart for identifying a genetic variant in apredetermined region of genomic sequence in a genetic locus of interest,which may be associated with a condition of interest, according to someembodiments.

FIG. 2 shows a graph of the mean age of onset of Alzheimer's disease asa function of the inheritance of the five common ApoE genotypes, andrepresenting ApoE 4 as a risk factor for Alzheimer's disease (1993).

FIG. 3 shows Regions A, B, and C on Chromosome 19, which are exemplarygenetic loci of interest. The TOMM40 gene is in close proximity to theApoE gene and encodes a 40 kD protein directed to the outermitochondrial membrane. TOMM40 interacts with ApoE directly inregulation of mitochondrial protein import, and a present hypothesis isthat the presence of a particular TOMM40 variant(s) exacerbates theincreased risk for Alzheimer's disease associated with thedose-dependent presence of the ApoE 3 allele.

FIG. 4 shows the phylogenetic tree that is formed using the sequencedata for the AS case/control cohort of subjects. ‘A’ and ‘B’ refer tothe two major clades that arise from the first branch point. The lengthsof the various alleles of rs10524523 (523) in each of the terminalclades of this tree are indicated. The APOE allele that is linked in cisto each rs10524523 length allele is also indicated.

FIG. 5 is a schematic diagram of the phylogenetic tree based on Region Bconstructed for TOMM40, showing the percentages of ApoE phenotypes intwo major groupings, or clades, of the TOMM40 variants in this region.

FIG. 6 is a schematic overview of the TOMM40 ApoE locus including an LDplot showing haplotype blocks and regions subject to primary sequencingin the exploratory (R1) (23 Kb) and confirmatory (R2) (10 Kb) studies(NCBI Build 36.3). The LD plot is shown for Hapmap data (CEU analysispanel), solid spine haplotype block definition, r² values with D′/LODcolor scheme represented by different line characteristics.

FIGS. 7A and 7B show representations of the phylogenetic trees withseparation of variants. FIG. 7A: SNP variants, clade A vs. B, E6-E11)represent TOMM40 exons and vertical lines indicate the approximatelocations of the SNPs. Separation of the two main branches has strongbootstrap support (973/1000). FIG. 7B: rs10524523 length polymorphisms.Descriptive statistics are provided for each group of lengthpolymorphisms. Several long haplotypes that formed individual outgroups,in the tree or very small clades, are in the group identified as‘Remainder.’

FIGS. 8A to 8C present histograms of the length of the rs10524523 lengthpolymorphism stratified by ApoE genotypes 3/3 (FIG. 8A), 3/4 (FIG. 8B),and 4/4 (FIG. 8C). N=210 haplotypes (AS cohort).

FIG. 9 shows the association between AD age of onset and length of thers10524523 polymorphism for AD patients with onset between 60 and 86years. Box plots indicate the 95% range (vertical lines), median(horizontal line in box) and interquartile range (box).

DETAILED DESCRIPTION

The present invention is explained in greater detail below. Thisdescription is not intended to be a detailed catalog of all thedifferent ways in which the invention may be implemented, or all of thefeatures that may be added to the instant invention. For example,features illustrated with respect to one embodiment may be incorporatedinto other embodiments, and features illustrated with respect to aparticular embodiment may be deleted from that embodiment. In addition,numerous variations and additions to the various embodiments suggestedherein will be apparent to those skilled in the art in light of theinstant disclosure which do not depart from the instant invention.Hence, the following specification is intended to illustrate someparticular embodiments of the invention, and not to exhaustively specifyall permutations, combinations and variations thereof.

As used in the description of the invention and the appended claims, thesingular forms “a”, “an” and “the” are intended to include the pluralforms as well, unless the context clearly indicates otherwise. Also, asused herein, “and/or” refers to and encompasses any and all possiblecombinations of one or more of the listed items, as well as the lack ofcombinations when interpreted in the alternative (“or”).

The present invention is directed to methods for revealing geneticvariation in regions of particular interest for complex diseases anddisorders. It also relates to the discovery of the most informativegenetic markers on the basis of associations with phenotype information.In one embodiment, the invention may be used to locate genetic markersassociated with susceptibility to a particular disease, disorder, orcondition. In another embodiment, data regarding subject response to acandidate treatment or drug may be included in a phylogenetic analysisfor the location of genetic markers associated with a beneficialresponse to that treatment or drug (i.e., pharmacogenetics). The methodscan be applied on any data set of genetic variation from a particularlocus. See FIG. 1 for a flowchart of the approach to finding geneticrisk factors according to the present invention.

In one aspect, the analysis of the genetic variation is based on variantsequence data. In a second aspect, the structure is uncovered usingdiploid genotype data, thereby avoiding the need to eitherexperimentally or computationally infer the component haplotypes (see,e.g., U.S. Pat. No. 6,027,896 to Roses et al.). In another aspect, thepresent method can be applied onto uncharacterized allelic variationthat results from the interrogation of a target nucleic acid with anexperimental procedure that provides a record of the sequence variationpresent but does not actually provide the entire sequence. Theunderlying structure of genetic variation is also useful for thededuction of the constituent haplotypes from diploid genotype data.

It is preferred and contemplated that the methods described herein beused in conjunction with other clinical diagnostic information known ordescribed in the art which are used in evaluation of subjects withdiseases or disorders (e.g., those believed to involve mitochondrialdysfunction (e.g. Alzheimer's disease or other neurodegenerativediseases)) or for evaluation of subjects suspected to be at risk fordeveloping such disease. The invention is also applicable for discoveryof genetic risk factors for other complex diseases, disorders, orconditions.

The disclosures of all United States patent references cited herein arehereby incorporated by reference herein in their entirety.

1. Definitions. The Following Definitions are Used Herein

“Condition of interest” refers to a specific condition, disease, ordisorder designated for phylogenetic study and/or subsequent diagnosisor prognosis. “Condition” as used herein includes, but is not limitedto, conditions associated with ApoE and/or TOMM40 and/or mitochondrialdysfunction, e.g., neurodegenerative diseases, metabolic diseases,psychiatric disorders, and cancer.

Examples of conditions in which ApoE and/or TOMM40 have been implicatedinclude, but are not limited to, cardiovascular disease; metabolicdisease; neurodegenerative disease; neurological trauma or disease;autoimmune disease (e.g., multiple sclerosis (Pinholt M, et al. Apo E inmultiple sclerosis and optic neuritis: the apo E-epsilon4 allele isassociated with progression of multiple sclerosis. Mult Scler. 11:511-5(2005); Masterman, T. & Hillert, J. The telltale scan: APOE ε4 inmultiple sclerosis. Lancet Neurol. 3: 331 (2004), neuropsychiatricsystemic lupus erythematosus (Pullmann Jr. R, et al. Apolipoprotein Epolymorphism in patients with neuropsychiatric SLE. Clin Rheumatol. 23:97-101 (2004)), etc.)); viral infection (e.g., liver disease associatedwith hepatitis C infection (Wozniak Mass., €t al, Apolipoprotein E-ϵ4protects against severe liver disease caused by hepatitis C virus.Hepatol. 36: 456-463 (2004)), HIV disease (Burt T D, et al.Apolipoprotein (apo) E4 enhances HIV-1 cell entry in vitro, and the APOEepsilon4/epsilon4 genotype accelerates HIV disease progression. ProcNatl Acad Sci USA. 105:8718-23 (2008)), etc.)); hipfracture/osteoporosis (Pluijm S M, et al. Effects of gender and age onthe association of apolipoprotein E epsilon4 with bone mineral density,bone turnover and the risk of fractures in older people. Osteoporos Int13: 701-9 (2002)); mitochondrial diseases (Chang S, et al. Lipid- andreceptor-binding regions of apolipoprotein E4 fragments act in concertto cause mitochondrial dysfunction and neurotoxicity. Proc Natl Acad SciUSA. 102:18694-9 (2005)); aging (Schachter F, et al. Geneticassociations with human longevity at the APOE and ACE loci. Nat Genet.6:29-32 (1994); Rea I M, et al., Apolipoprotein E alleles innonagenarian subjects in the Belfast Elderly Longitudinal Free-livingAgeing Study (BELFAST). Mech. Aging and Develop. 122: 1367-1372 (2001));inflammation (Li L, et al., Infection induces a positive acute phaseapolipoprotein E response from a negative acute phase gene: role ofhepatic LDL receptors. J Lipid Res. 49:1782-93 (2008)); and memorydysfunction (Caselli R J, et al. Longitudinal modeling of age-relatedmemory decline and the APOE epsilon4 effect. N Engl J Med. 361:255-63(2009)).

“Cardiovascular disease” as used herein refers to a disease involvingthe heart and/or blood vessels, including, but not limited to, coronaryartery disease (Song Y, et al. Meta-analysis: apolipoprotein E genotypesand risk for coronary heart disease. Ann Intern Med. 141:137-47 (2004);Bennet A M, et al., Association of apolipoprotein E genotypes with lipidlevels and coronary risk. JAMA 298:1300-11 (2007)), atherosclerosis(Norata G D, et al. Effects of PCSK9 variants on common carotid arteryintima media thickness and relation to ApoE alleles. Atherosclerosis(2009) Jun. 27. [Epub ahead of print], doi:10.1016/j.atherosclerosis2009.06.023; Paternoster L, et al. Association Between Apolipoprotein EGenotype and Carotid Intima-Media Thickness May Suggest a SpecificEffect on Large Artery Atherothrombotic Stroke. Stroke 39:48-54 (2008)),ischemic heart disease (Schmitz F, et al., Robust association of theAPOE 4 allele with premature myocardial infarction especially inpatients without hypercholesterolaemia: the Aachen study. Eur. J Clin.Investigation 37: 106-108 (2007)), vascular disease such as ischemicstroke (Peck G, et al. The genetics of primary haemorrhagic stroke,subarachnoid haemorrhage and ruptured intracranial aneurysms in adults.PLoS One. 3:e3691 (2008); Paternoster L, et al. Association BetweenApolipoprotein E Genotype and Carotid Intima-Media Thickness May Suggesta Specific Effect on Large Artery Atherothrombotic Stroke. Stroke39:48-54 (2008)), vascular dementia (Bang O Y, et al. Important linkbetween dementia subtype and apolipoprotein E: a meta-analysis. YonseiMed J. 44:401-13 (2003); Baum L, et al. Apolipoprotein E epsilon4 alleleis associated with vascular dementia. Dement Geriatr Cogn Disord.22:301-5 (2006)), etc.

“Neurodegenerative disease” as used herein refers to Alzheimer's disease(Corder E H, et al. Gene dose of apolipoprotein E type 4 allele and therisk of Alzheimer's disease in late onset families. Science 261:921-3(1993); Corder E H, et al. There is a pathologic relationship betweenApoE-epsilon 4 and Alzheimer's disease. Arch Neural. 52:650-1 (1995)),Parkinson's disease (Huang X, et al. Apolipoprotein E and dementia inParkinson disease: a meta-analysis. Arch Neurol. 63:189-93 (2006); HuangX et al. APOE-[epsilon]2 allele associated with higher prevalence ofsporadic Parkinson disease. Neurology 62:2198-202 (2004); Martinez, M.et al. Apolipoprotein E4 is probably responsible for the chromosome 19linkage peak for Parkinson's disease. Am. J. Med. Genet. BNeuropsychiatr. Genet. 136B, 172-174 (2005)), Huntington's disease, anda plurality of less common diseases and disorders which cause neurons todecline, e.g., age-related macular degeneration (Thakkinstian A, et al.Association between apolipoprotein E polymorphisms and age-relatedmacular degeneration: A HuGE review and meta-analysis. Am J Epidemiol.164:813-22 (2006); Bojanowski C M, et al. An apolipoprotein E variantmay protect against age-related macular degeneration through cytokineregulation. Environ Mol Mutagen. 47:594-602 (2006)).

“Neurological trauma or disease” includes, but is not limited to,outcome after head injury (Zhou W, et al. Meta-analysis of APOE4 alleleand outcome after traumatic brain injury. J Neurotrauma. 25:279-90(2008); Lo T Y, et al. Modulating effect of apolipoprotein Epolymorphisms on secondary brain insult and outcome after childhoodbrain trauma. Childs Nery Syst. 25:47-54 (2009)), migraine (Gupta R, etal. Polymorphism in apolipoprotein E among migraineurs and tension-typeheadache subjects. J Headache Pain. 10:115-20 (2009)), vasogenic edema(James M L, et al. Apolipoprotein E modifies neurological outcome byaffecting cerebral edema but not hematoma size after intracerebralhemorrhage in humans. J Stroke Cerebrovasc Dis. 18:144-9 (2009); James ML, et al. Pharmacogenomic effects of apolipoprotein e on intracerebralhemorrhage. Stroke 40:632-9 (2009)), etc.

“Metabolic disease” as used herein includes, but is not limited to,dyslipidemia (Willer C J, et al. Newly identified loci that influencelipid concentrations and risk of coronary artery disease. Nat Genet.40:161-9 (2008); Bennet A M, et al., Association of apolipoprotein Egenotypes with lipid levels and coronary risk. JAMA 298:1300-11 (2007)),end stage renal disease (Oda H, et al. Apolipoprotein E polymorphism andrenal disease. Kidney Int Suppl. 71:S25-7 (1999); Hubacek J A, et al.Apolipoprotein E Polymorphism in Hemodialyzed Patients and HealthyControls. Biochem Genet. (2009) Jun. 30. [Epub ahead of print] DOI10.1007/s10528-009-9266-y.), chronic kidney disease (Yoshida T, et al.Association of a polymorphism of the apolipoprotein E gene with chronickidney disease in Japanese individuals with metabolic syndrome. Genomics93:221-6 (2009); Leiva E, et al. Relationship between Apolipoprotein Epolymorphism and nephropathy in type-2 diabetic patients. Diabetes ResClin Pract. 78:196-201 (2007)), gallbladder disease (Boland L L, et al.Apolipoprotein E genotype and gallbladder disease risk in a largepopulation-based cohort. Ann Epidemiol. 16:763-9 (2006); Andreotti G, etal. Polymorphisms of genes in the lipid metabolism pathway and risk ofbiliary tract cancers and stones: a population-based case-control studyin Shanghai, China. Cancer Epidemiol Biomarkers Prev. 17:525-34 (2008)),diabetes mellitus (type II) (Elosua R, et al. Obesity Modulates theAssociation among APOE Genotype, Insulin, and Glucose in Men. Obes Res.11:1502-1508 (2003); Moreno J A, et al. The Apolipoprotein E GenePromoter (−219G/T) Polymorphism Determines Insulin Sensitivity inResponse to Dietary Fat in Healthy Young Adults. J. Nutr. 135:2535-2540(2005)), metabolic syndrome, cholelithiasis (Abu Abeid S, et al.Apolipoprotein-E genotype and the risk of developing cholelithiasisfollowing bariatric surgery: a clue to prevention of routineprophylactic cholecystectomy. Obes Surg. 12:354-7 (2002)), etc.

“Psychiatric Disorder” as used herein refers to schizophrenia (KampmanO, et al. Apolipoprotein E polymorphism is associated with age of onsetin schizophrenia. J Hum Genet. 49:355-9 (2004); Dean B. et al., Plasmaapolipoprotein E is decreased in schizophrenia spectrum and bipolardisorder. Psychiatry Res. 158:75-78 (2008)), obsessive compulsivedisorder (OCD), addictive behavior (smoking addiction, alcoholaddiction, etc.), bipolar disorder (Dean B. et al., Plasmaapolipoprotein E is decreased in schizophrenia spectrum and bipolardisorder. Psychiatry Res. 158:75-78 (2008)), and other diseases,disorders, or conditions of a psychiatric nature.

“Development of a condition” as used herein refers to either an initialdiagnosis of a disease, disorder, or other medical condition, orexacerbation of an existing disease, disorder, or medical condition forwhich the subject has already been diagnosed.

“Diagnosis” or “prognosis” as used herein refers to the use ofinformation (e.g., genetic information or data from other moleculartests on biological samples, signs and symptoms, physical exam findings,cognitive performance results, etc.) to anticipate the most likelyoutcomes, timeframes, and/or response to a particular treatment for agiven disease, disorder, or condition, based on comparisons with aplurality of individuals sharing common nucleotide sequences, symptoms,signs, family histories, or other data relevant to consideration of apatient's health status.

“Biological sample” as used herein refers to a material suspected ofcontaining a nucleic acid of interest. Biological samples containing DNAinclude hair, skin, cheek swab, and biological fluids such as blood,serum, plasma, sputum, lymphatic fluid, semen, vaginal mucus, feces,urine, spinal fluid, and the like. Isolation of DNA from such samples iswell known to those skilled in the art.

A “subject” according to some embodiments is an individual whosegenotype(s) or haplotype(s) are to be determined and recorded inconjunction with the individual's condition (i.e., disease or disorderstatus) and/or response to a candidate drug or treatment. Nucleotidesequences from a plurality of subjects are used to construct aphylogenetic tree, and then analogous nucleotide sequences from anindividual subject may be compared to those on the phylogenetic tree fordiagnostic or prognostic purposes.

“Gene” as used herein means a segment of DNA that contains all theinformation for the regulated biosynthesis of an RNA product, includingpromoters, exons, introns, and other untranslated regions that controlexpression.

“Genetic locus” or “locus” as used herein means a location on achromosome or DNA molecule, often corresponding to a gene or a physicalor phenotypic feature or to a particular nucleotide or stretch ofnucleotides. Loci is the plural form of locus.

“Amplification,” as applied to nucleic acids herein refers to any methodthat results in the formation of one or more copies of a nucleic acid,where preferably the amplification is exponential. One such method forenzymatic amplification of specific sequences of DNA is known as thepolymerase chain reaction (PCR), as described by Saiki et al., 1986,Science 230:1350-1354. Primers used in PCR normally vary in length fromabout 10 to 50 or more nucleotides, and are typically selected to be atleast about 15 nucleotides to ensure sufficient specificity. The doublestranded fragment that is produced is called an “amplicon,” and may varyin length from as few as about 30 nucleotides, to 20,000 or more.

A “marker” or “genetic marker” as used herein is a known variation of aDNA sequence at a particular locus. The variation may be present in anindividual due to mutation or inheritance. A genetic marker may be ashort DNA sequence, such as a sequence surrounding a single base-pairchange (single nucleotide polymorphism, SNP), or a long one, likeminisatellites. Markers can be used to study the relationship between aninherited disease and its genetic cause (for example, a particularmutation of a gene that results in a defective or otherwise undesirableform of protein).

A “genetic risk factor” as used herein means a genetic marker that isassociated with increased susceptibility to a condition, disease, ordisorder. It may also refer to a genetic marker that is associated witha particular response to a selected drug or treatment of interest.

“Associated with” as used herein means the occurrence together of two ormore characteristics more often than would be expected by chance alone.An example of association involves a feature on the surface of whiteblood cells called HLA (HLA stands for human leukocyte antigen). Aparticular HLA type, HLA type B-27, is associated with an increased riskfor a number of diseases including ankylosing spondylitis. Ankylosingspondylitis is 87 times more likely to occur in people with HLA B-27than in the general population.

A subject “at increased risk of developing a condition” due to a geneticrisk factor is one who is predisposed to the condition, has geneticsusceptibility for the condition, and/or is more likely to develop thecondition than subjects in which the genetic risk factor is absent. Forexample, a subject who is “at increased risk of developing Alzheimer'sdisease” due to the presence of one or two ApoE 4 alleles is more likelyto develop Alzheimer's disease than a subject who does not carry an ApoE4 allele.

“Polymorphism” as used herein refers to the existence of two or moredifferent nucleotide sequences at a particular locus in the DNA of thegenome. Polymorphisms can serve as genetic markers and may also bereferred to as genetic variants. Polymorphisms include nucleotidesubstitutions, insertions, deletions and microsatellites, and may, butneed not, result in detectable differences in gene expression or proteinfunction. A polymorphic site is a nucleotide position within a locus atwhich the nucleotide sequence varies from a reference sequence in atleast one individual in a population.

A “deletion/insertion polymorphism” or “DIP” as used herein is aninsertion of one or more nucleotides in one version of a sequencerelative to another. If it is known which of the alleles represent minoralleles, the term “deletion” is used when the minor allele is a deletionof a nucleotide, and the term “insertion” is used when the minor alleleis an addition of a nucleotide. The term “deletion/insertionpolymorphism” is also used when there are multiple forms or lengths andthe minor allele is not apparent. For example, for the poly-Tpolymorphisms described herein, multiple lengths of polymorphisms areobserved.

“Polymorphism data” as used herein means information concerning one ormore of the following for a specific gene: location of polymorphicsites; sequence variation at those sites; frequency of polymorphisms inone or more populations; the different genotypes and/or haplotypesdetermined for the gene; frequency of one or more of these genotypesand/or haplotypes in one or more populations; and any knownassociation(s) between a trait and a genotype or a haplotype for thegene.

“Haplotype” as used herein refers to a genetic variant or combination ofvariants carried on at least one chromosome in an individual. Ahaplotype often includes multiple contiguous polymorphic loci. All partsof a haplotype as used herein occur on the same copy of a chromosome orhaploid DNA molecule. Absent evidence to the contrary, a haplotype ispresumed to represent a combination of multiple loci that are likely tobe transmitted together during meiosis. Each human carries a pair ofhaplotypes for any given genetic locus, consisting of sequencesinherited on the homologous chromosomes from two parents. Thesehaplotypes may be identical or may represent two different geneticvariants for the given locus. Haplotyping is a process for determiningone or more haplotypes in an individual. Haplotyping may include use offamily pedigrees, molecular techniques and/or statistical inference.

A “variant,” “variance,” or “genetic variant” as used herein, refers toa specific isoform of a haplotype in a population, the specific formdiffering from other forms of the same haplotype in the sequence of atleast one, and frequently more than one, variant sites or nucleotideswithin the sequence of the gene. The sequences at these variant sitesthat differ between different alleles of a gene are termed “genesequence variants,” “alleles,” “variances” or “variants.” The term“alternative form” refers to an allele that can be distinguished fromother alleles by having at least one, and frequently more than one,variant sites within the gene sequence. Other terms known in the art tobe equivalent to “variances” or “variants” include mutations and singlenucleotide polymorphisms (SNPs). Reference to the presence of a varianceor variances means particular variances, i.e., particular nucleotides atparticular polymorphic sites, rather than just the presence of anyvariance in the gene.

“Isoform” as used herein means a particular form of a gene, mRNA, cDNAor the protein encoded thereby, distinguished from other forms by itsparticular sequence and/or structure. For example, the ApoE 4 isoform ofapolipoprotein E as opposed to the ApoE2 or ApoE 3 isoforms.

“Cistron” as used herein means a section of DNA found on a singlechromosome that contains the genetic code for a single polypeptide andfunctions as a hereditary unit. A cistron includes exons, introns, andregulatory elements related to a single functional unit (i.e., a gene).The term derives from the classic cis-trans test for determining whethergenetic elements were able to functionally interact regardless ofwhether they were located on the same DNA molecule (“trans”complementation) or only when they were located on the same DNA molecule(“cis” acting elements).

The term “genotype” in the context of this invention refers to theparticular allelic form of a gene, which can be defined by theparticular nucleotide(s) present in a nucleic acid sequence at aparticular site(s). Genotype may also indicate the pair of allelespresent at one or more polymorphic loci. For diploid organisms, such ashumans, two haplotypes make up a genotype.

Genotyping is any process for determining a genotype of an individual,e.g., by nucleic acid amplification, antibody binding, or other chemicalanalysis. The resulting genotype may be unphased, meaning that thesequences found are not known to be derived from one parental chromosomeor the other.

“Linkage disequilibrium” as used herein means the non-random associationof alleles at two or more loci. Linkage disequilibrium describes asituation in which some combinations of alleles or genetic markers occurmore or less frequently in a population than would be expected from arandom formation of haplotypes from alleles based on their frequencies.Non-random associations between polymorphisms at different loci aremeasured by the degree of linkage disequilibrium.

“Multiple sequence alignment” or “MSA” as used herein means alignment ofthree or more nucleotide sequences from genomic DNA derived from aplurality of individuals to determine homology and heterology betweenthe sequences. In general, the input set of query sequences are assumedto have an evolutionary relationship by which they share a lineage andare descended from a common ancestor. Computer algorithms are most oftenused to perform the analysis of aligned sequences.

Some embodiments of the present invention are described with referenceto block diagrams illustrating methods (e.g., FIG. 1), which may includesteps implemented by a computer and/or computer program products. Itwill be understood that each block of the block diagrams and/oroperational illustrations, and combinations of blocks in the blockdiagrams and/or operational illustrations, can be implemented by analogand/or digital hardware, and/or computer program instructions. Thesecomputer program instructions may be provided to a processor of ageneral purpose computer, special purpose computer, ASIC, and/or otherprogrammable data processing apparatus, such that the instructions,which execute via the processor of the computer and/or otherprogrammable data processing apparatus, create means for implementingthe functions/acts specified in the block diagrams and/or operationalillustrations. Accordingly, it will be appreciated that the blockdiagrams and operational illustrations support apparatus, methods andcomputer program products.

Other software, such as an operating system, also may be included. Itwill be further appreciated that the functionality of the multiplesequence alignment module, mapping module and/or other modules describedherein may be embodied, at least in part, using discrete hardwarecomponents, one or more Application Specific Integrated Circuits (ASIC)and/or one or more special purpose digital processors and/or computers.

“Mapping” as used herein means creating a phylogenetic tree by assigninga node to each new nucleotide sequence variant observed, connecting thatnode to another node representing a known sequence carried by the sameindividual on the same chromosome or cistron, and counting the numbersof each type of subject represented at each node. See FIG. 4 for anexample of a phylogenetic tree developed in this manner.

“Phylogenetic” means related to the study of evolutionary connectionsamong various groups of organisms or individuals within a species.Before genetic information was readily available, phylogeny was basedmostly on phenotypic observation. “Phylogenetic mapping” as used hereinmeans using DNA sequence data to connect related sequence variantscarried by a plurality of individuals in order to determine evolutionaryconnections and the chronology of divergence. A “phylogenetic tree” isthe result of mapping the connections between variants.

“Node” as used herein means a polymorphism data point on a phylogenetictree representing an actual variant sequence carried by at least onesubject. A node is connected by a branch to another node representing avariant sequence carried by the same individual on the same chromosomeand in the same cistron but at a different genetic locus within thecistron. The presence of a node indicates that at least one subjectcarried both the sequence indicated by the node as well as the sequencerepresented by the neighboring node to which it is connected by abranch.

“Branch” as used herein means a connection between two nodesrepresenting two distinct variant sequences or haplotypes, wherein thetwo variants are located on the same chromosome and in the same cistronfrom an individual subject. “Branching point” means any node from whichmore than two branches extend, but it is especially used herein to referto a root node from which three or more nodes extend. A “root node”represents the genetic sequence of a common evolutionary ancestor fromwhich genetic divergence has generated the variety of nearby sequencevariants represented by the connected nodes.

“Iteratively” as used herein refers to repetitive calculation of valuesfor each character in a series. For example, each node on a phylogenetictree is analyzed to calculate the ratio of the number of subjectsaffected with a condition of interest (such as Alzheimer's disease) tocontrol unaffected subjects; this ratio is compared with the connectednodes to locate correlations with increased or decreased risk fordeveloping a disease, disorder, or condition of interest. A substantialchange in this ratio between one node and the next indicates thepresence of a variant that either increases or decreases the risk ofearlier disease onset. “Iteratively examining the genetic variants”means beginning the analysis with nodes representing the sequencesshared by the greatest numbers of individual subjects and successivelyanalyzing the nodes connected by branches extending from that node,followed by the second level of nodes, and so on. The analysis thenmoves overall from the roots of the tree toward the outer branches andnodes of the tree.

“Treatment” as used herein includes any drug, procedure, lifestylechange, or other adjustment introduced in attempt to effect a change ina particular aspect of a subject's health (i.e. directed to a particulardisease, disorder, or condition).

“Drug” as used herein refers to a chemical entity or biological product,or combination of chemical entities or biological products, administeredto a person to treat or prevent or control a disease or condition. Theterm “drug” as used herein is synonymous with the terms “medicine,”“medicament,” “therapeutic intervention,” or “pharmaceutical product.”Most preferably the drug is approved by a government agency fortreatment of at least one specific disease or condition.

“Disease,” “disorder,” and “condition” are commonly recognized in theart and designate the presence of signs and/or symptoms in an individualor patient that are generally recognized as abnormal and/or undesirable.Diseases or conditions may be diagnosed and categorized based onpathological changes. The disease or condition may be selected from thetypes of diseases listed in standard texts such as Harrison's Principlesof Internal Medicine, 1997, or Robbins Pathologic Basis of Disease,1998.

“Mitochondrial dysfunction” as used herein means any detrimentalabnormalities of the mitochondria within a cell or cells. Some diseases,disorders, or conditions presently known in the art to be associatedwith mitochondrial dysfunction include Alzheimer's disease, Parkinson'sdisease, and other neurodegenerative diseases, ischemia-reperfusioninjury in stroke and heart attack, epilepsy, diabetes, and aging. Manyother diseases, disorders, and conditions have been associated withmitochondrial dysfunction in the art. Indeed, the mitochondrion iscritical for proper functioning of most cell types, and mitochondrialdecline often leads to cell death. This mitochondrial dysfunction causescell damage and death by compromising ATP production, disrupting calciumhomeostasis and increasing oxidative stress. Furthermore, mitochondrialdamage can lead to apoptotic cell death by causing the release ofcytochrome c and other pro-apoptotic factors into the cytoplasm (forreview, see Wallace, 1999; Schapira, 2006). Regarding a specific examplefound herein, the ApoE 3 and ApoE 4 isoforms are hypothesized to causemitochondrial dysfunction through interactions with TOMM40. Some TOMM40variants may act synergistically with ApoE 3 isoform to acceleratemitochondrial decline. This mitochondrial mechanism is believed tocontribute to many complex genetic diseases, disorders, and conditions.

“Subjects” as used herein are preferably, but not limited to, humansubjects. The subjects may be male or female and may be of any race orethnicity, including, but not limited to, Caucasian, African-American,African, Asian, Hispanic, Indian, etc. The subjects may be of any age,including newborn, neonate, infant, child, adolescent, adult, andgeriatric. Subjects may also include animal subjects, particularlymammalian subjects such as canines, felines, bovines, caprines, equines,ovines, porcines, rodents (e.g., rats and mice), lagomorphs, primates(including non-human primates), etc., screened for veterinary medicineor pharmaceutical drug development purposes.

“Treat,” “treating,” or “treatment” as used herein refers to any type ofmeasure that imparts a benefit to a patient afflicted with a disease,including improvement in the condition of the patient (e.g., in one ormore symptoms), delay in the onset or progression of the disease, etc.

“Late-onset Alzheimer's disease” or “LOAD” as used herein is known inthe art, and is the classification used if the Alzheimer's disease hasan onset or is diagnosed after the age of 65. It is the most common formof Alzheimer's disease.

2. Methods for Identifying Genetic Variants

While lists of associations derived from genome-wide scans are useful,they are generally inadequate to explain disease complexity. Families,pathways, and interactions of genes can provide specificities.High-resolution variance mapping may reveal answers to complex geneticinteractions. This is particularly applicable where one known geneticrisk factor which does not itself entirely explain an association to thedisease, disorder, or condition of interest may present an excellentcandidate genetic locus for more detailed investigations. Furthermore,pharmacogenetics, while useful for drug development, can also extendbiological relevance. The analysis of sequence data from large numbersof individuals to discover variances in the gene sequence betweenindividuals in a population will result in detection of a greaterfraction of all the variants in the population.

The initial sequence information to be analyzed by the method of thepresent invention is derived from the genomic DNA of a plurality ofsubjects. The organism can be any organism for which multiple sequencesare available, but is preferably from human. In identifying newvariances it is often useful to screen different population groups basedon race, ethnicity, gender, and/or geographic origin because particularvariances may differ in frequency between such groups. Most preferably,for diseases or disorders believed to be multigenic (genetically complexdiseases/disorders), the phenotypes represented by the subjectpopulation are from the extremes of a spectrum. Biological samplescontaining DNA may be blood, semen, cheek swab, etc. Isolation of DNAfrom such samples is well known in the art.

In some embodiments, the invention relates to the analysis of nucleotidesequence data from a plurality of subjects having at least one knownrisk factor for a given disease, disorder, or condition (genetic orotherwise). The nucleotide sequences are analyzed to generate haplotypedata, and the haplotypes or genetic variants are then mapped onto aphylogenetic tree to demonstrate the evolution of the sequencesrepresented. By comparing this tree to phenotype data about theplurality of subjects, a prognosis or diagnosis is possible for anindividual subject carrying haplotypes observed on the phylogenetictree.

In other embodiments, the invention relates to the fields ofpharmacogenetics and pharmacogenomics and the use of genetic haplotypeinformation to predict an individual's susceptibility to disease and/ortheir response to a particular drug or drugs, so that drugs tailored togenetic differences of population groups may be developed and/oradministered to individuals with the appropriate genetic profile.

Nucleotide sequence information is derived from genomic DNA. Genomicsequence data used may be obtained from clinical or non-human animals orfrom cultured cells or isolated tissue studies. The organism can be anyorganism for which multiple sequences are available, but is preferablyfrom human. In identifying new variances it is often useful to screendifferent population groups based on race, ethnicity, gender, and/orgeographic origin because particular variances may differ in frequencybetween such groups. Most preferably, for diseases or disorders believedto be multigenic (genetically complex diseases/disorders), thephenotypes represented by the subject population are extreme opposites.

Biological samples containing DNA may be blood, semen, cheek swab, etc.Isolation of DNA from such samples is well known in the art. Methods fordetermining DNA sequence at a particular genetic locus of interest arealso known in the art. Automated sequencing is now widely available andrequires only an isolated DNA sample and at least one primer that isspecifically designed to recognize a highly conserved sequence within orin close proximity to the genetic locus of interest.

According to some embodiments, a defined genetic region or locus ofinterest (e.g., defined by a set of forward and reverse PCR primers) iscarefully sequenced from a cohort of people inclusive of patients whoare well characterized for a particular disorder.

A consensus sequence is determined, and all observed sequence variantsfor a given genetic locus are compiled into a list. Loci having thegreatest number of observed variants represent evolutionary divergencefrom a common ancestor. As such, these loci are connected in cis to locihaving only one or very few observed variants. During initial phases ofinvestigation at least, it is preferred that populations be parsed intogroups of subjects sharing a common general phenotype representingsimilar ancestry. Otherwise, analysis of these data through constructionof phylogenetic trees will require a prohibitively large number ofsubjects.

3. Multiple Sequence Alignments

Determining the presence of a particular variance or plurality ofvariances in a gene or gene region in a population can be performed in avariety of ways, all of which involve locating a particular geneticlocus by targeting sequences within the region of interest that areknown to be highly conserved. From the highly conserved locus, thecontiguous sequences are easily obtained through one of many techniqueswell-known in the art.

The first step in analyzing parallel DNA sequences from a plurality ofsubjects is multiple sequence alignment (“MSA”). MSA is typically usedto display sequence alignment from homologous samples with polymorphicdifferences within genes or gene regions to show conserved areas andvariant sequences. MSAs of the sequence information obtained at thelocus of interest may be constructed using one or more various knowntechniques and publicly available software, and are publicly availablefrom many sources including the Internet. Methods for analyzing multiplesequence alignments known in the art include, e.g., those described inU.S. Pat. No. 6,128,587 to Sjolander; U.S. Pat. No. 6,291,182 to Schorket al.; and U.S. Pat. No. 6,401,043 to Stanton et al.

4. Phylogenetic Trees and Analysis

Various methods for construction of “phylogenetic trees” are known inthe art (See, e.g., Sanderson, 2008). Sun et al. used “haplotype block”analyses to study associations between toll-like receptor (TLR) variantsand prostate cancer (2005) and Bardel et al. (2005) used a cladisticanalysis approach to investigate associations between CARD15 genevariants and Crohn's disease. However, neither utilized genetic locipreviously associated with the disease to investigate linkages.

Phylogenetic trees according to some embodiments may be constructed witha topology in which haplotype sequence variants observed in individualhuman subjects studied form nodes (representing each sequence observedin the data) on a tree. Nodes may be joined to other nodes, and thecommon ancestor is found at the branching site, common root or root nodeof the tree. A phylogenetic tree reflects the evolutionary relationshipbetween genetic loci for which data are analyzed (see Sanderson, 2008;Tzeng, 2005; Seltman, 2003). FIG. 4 shows a detailed phylogenetic treeconstructed for Region B of the genetic locus shown in FIG. 3.

The starting point for phylogenetic tree estimation is generally an MSA(see above). Multiple software applications are available forconstructing phylogenetic trees based on sequence data. See, e.g., U.S.Pat. Nos. 7,127,466 and 6,532,467 to Brocklebank, et al. The basicpremise is that a genetic locus exhibiting many variants is representedby these variants connected in cis. Polymorphisms create branchingpoints (nodes) in the tree that define groups of related sequences orhaplotypes.

The phylogenetic tree is utilized for information by iterativelyexamining ratios of subjects affected with a condition to unaffectedcontrol subjects; the calculations begin with nodes observed in thegreatest numbers of subjects and move toward the periphery of the treeto nodes observed in fewer subjects. The goal is to locate a branchingpoint, branch, or node where there is substantial change in the ratio ofsubjects affected with the condition of interest to unaffected controlsubjects. Such a branching point represents the evolutionary divergenceof higher risk subjects from lower risk subjects or vice versa.

Statistical analysis of the phylogenetic tree generated may be performedin accordance with the methods known in the art. One art-recognizedmethod is the calculation of bootstrap confidence levels (see Efron etal., Proc. Natl. Acad. Sci. USA 93, 13429-13434 (1996)).

5. Patient Evaluation

Once a phylogenetic tree has been generated for a particular geneticlocus, an individual subject may be evaluated by comparing their DNAsequence to the sequences that comprise the phylogenetic tree. Thepresence of haplotypes or sequence variants corresponding with regionsof the tree representing subjects with higher incidence of the conditionof interest (i.e., higher ratios of subjects affected with the diseaseor disorder to unaffected control subjects) would mean that theindividual subject is also at increased risk. Conversely, substantiallylower ratios correspond to reduced risk of developing the condition ofinterest.

Phylogenetic trees may also be analyzed based upon responsiveness of thecondition of interest to treatment with an active agent or treatmentmethod of interest according to some embodiments.

6. APOE and TOMM40

ApoE phenotypes and genotypes are well known in the art. The establishednomenclature system as well as the phenotypes and genotypes for ApoE aredescribed in, for example, Zannis et al., 1982, which is incorporated byreference herein.

TOMM40 (The Outer Mitochondrial Membrane channel subunit, 40 kDa)phenotypes and genotypes are also known. TOMM40 functions as achannel-forming subunit of the translocase found in mitochondrialmembrane that is essential for protein import into mitochondria.

Genome-wide association scanning data from studies of Alzheimer'sdisease patients have unequivocally identified the linkagedisequilibrium region that contains the apolipoprotein E (ApoE) gene.The ApoE 4 variant has been widely replicated as a confirmedsusceptibility gene since the initial publications in 1993 (see, e.g.,Corder et al.). However, the genome-wide association scanning dataresulted in a remarkable “coincidence” observed in cell biology studiesinvolving the co-localization ApoE and TOMM40 to the outer mitochondrialmembrane. This other gene, TOMM40, was first encountered during studiesmodeling linkage disequilibrium around ApoE in 1998. The polymorphismswere located adjacent to ApoE within a small linkage disequilibriumregion.

ApoE co-localizes to the outer mitochondrial membrane, suggestingisoform-specific interactions leading to a potential role forApoE-induced mitochondrial apoptosis as an early step in Alzheimer'sdisease expression. Biological data have demonstrated that theproportion of mobile mitochondria in neuronal cell culture, as well asthe speed at which they move and the distance that they traverse, arefactors affecting increased mitochondrial apoptosis. Phylogenetic datasuggest an independent genetic effect on the development of Alzheimer'sdisease for TOMM40.

ApoE binds specifically to mitochondria in human neuronal cultures(Chang, 2005), and sequencing of this linkage disequilibrium region inhundreds of Alzheimer's disease patients and matched controls, combinedwith mapping the genetic variant evolution of TOMM40, defines a regionof particular interest for ApoE-TOMM40 interactions, as shown in FIG. 3.These evolutionary data further support the genetic association betweenApoE and TOMM40, and suggest that mitochondrial dysfunction could beresponsible for neuronal death occurring slowly over many years. The ageof onset distribution for Alzheimer's disease (see, e.g., U.S. Pat. No.6,027,896 to Roses et al.) might reflect the inheritance of tightlylinked variants of two biochemically interacting proteins that lead tothe clinical expression of disease.

As detailed herein, the interaction between multiple haplotypes ofTOMM40 variants and ApoE alleles contribute to Alzheimer's diseasepathogenesis; in particular, haplotypes of TOMM40 in linkage to the E 3allele of ApoE contribute to disease pathogenesis. Several of the TOMM40gene variants evolved only cis-linked to ApoE 3. (Similarly, specificTOMM40 variants may have evolved cis-linked to ApoE 4 or ApoE 2.) Thus,any added genetic effect of the TOMM40 variants segregates independentlyof ApoE 4 but the two variant protein products may functionallyinteract, in trans, to produce a given observable phenotype or trait.Thus, any added genetic effect of the TOMM40 variants segregatesindependently from ApoE 4. This “coincidence” of adjacent interactinggenes may account for the extraordinarily significant statisticalassociation data found in all Alzheimer's disease genome-wideassociation scanning studies. It is of interest to note that the initialcommercially available genome-wide association scanning platforms didnot contain any ApoE polymorphisms, but were identified with TOMM40 andApoC1 SNPs—but the region is virtually always referred to as the “ApoEregion.”

These data, which combine disease genetics and putative molecularmechanisms of pathogenesis, can also be viewed within a pharmacogeneticscontext. Because of the strong genetic effect of inheriting an ApoE 4allele, ApoE 4 has been referred to as a complex susceptibility gene formore than a decade. Consistent replications of the age of onsetdistributions as a function of ApoE genotype confirm that the role ofApoE 3 inheritance is not totally benign, but is a lower risk factorobserved at a slower rate of disease onset. There are genetic variantsof TOMM40 that are located only on DNA strands containing ApoE 3 in thelinkage disequilibrium regions (Roses et al., unpublished data), andthus not in Hardy-Weinberg equilibrium as was required for SNPs ingenome-wide association panels. Evolutionary changes in TOMM40 sequencesthat are cis-linked only to ApoE 3 act to increase the risk ofAlzheimer's disease associated with ApoE 3, while other variants ofTOMM40 cis-linked to ApoE 3 decrease the risk associated with ApoE 3. Anindependent genetic test would be to determine whether those TOMM40polymorphisms associated with less Alzheimer's disease segregate at alater age in age of onset distribution plots for ApoE 3 containinggenotypes [ApoE 3/3 or ApoE 4/3].

Detecting the presence or absence of ApoE 2, 3 or 4, and/or TOMM40haplotypes or of DNA encoding the same (including, in some embodiments,the number of alleles for each) in a subject may be carried out eitherdirectly or indirectly by any suitable means. A variety of techniquesare known to those skilled in the art. All generally involve the step ofcollecting a sample of biological material containing either DNA orprotein from the subject, and then detecting whether or not the subjectpossesses the haplotype of interest. For example, the detecting stepwith respect to ApoE may be carried out by collecting an ApoE samplefrom the subject (for example, from cerebrospinal fluid, or any otherfluid or tissue containing ApoE), and then determining the presence orabsence of an ApoE 2, 3, or 4 isoform in the ApoE sample (e.g., byisoelectric focusing or immunoassay).

Determining the presence or absence of DNA encoding an ApoE and/orTOMM40 isoform may be carried out by direct sequencing of the genomicDNA region of interest, with an oligonucleotide probe labeled with asuitable detectable group, and/or by means of an amplification reactionsuch as a polymerase chain reaction or ligase chain reaction (theproduct of which amplification reaction may then be detected with alabeled oligonucleotide probe or a number of other techniques). Further,the detecting step may include the step of detecting whether the subjectis heterozygous or homozygous for the gene encoding an ApoE and/orTOMM40 haplotype. Numerous different oligonucleotide probe assay formatsare known which may be employed to carry out the present invention. See,e.g., U.S. Pat. No. 4,302,204 to Wahl et al.; U.S. Pat. No. 4,358,535 toFalkow et al.; U.S. Pat. No. 4,563,419 to Ranki et al.; and U.S. Pat.No. 4,994,373 to Stavrianopoulos et al. (applicants specifically intendthat the disclosures of all U.S. Patent references cited herein beincorporated herein by reference).

In some embodiments, detection may include multiplex amplification ofthe DNA (e.g., allele-specific fluorescent PCR). In some embodiments,detection may include hybridization to a microarray (a chip, beads,etc.). In some embodiments, detection may include sequencing appropriateportions of the gene containing the haplotypes sought to be detected. Insome embodiments, haplotypes that change susceptibility to digestion byone or more endonuclease restriction enzymes may be used for detection.For example, restriction fragment length polymorphism (RFLP), whichrefers to the digestion pattern when various restriction enzymes areapplied to DNA, may be used. In some embodiments, the presence of one ormore haplotypes can be determined by allele specific amplification. Insome embodiments, the presence of haplotypes can be determined by primerextension. In some embodiments, the presence of haplotypes can bedetermined by oligonucleotide ligation. In some embodiments, thepresence of haplotypes can be determined by hybridization with adetectably labeled probe. See, e.g., U.S. Patent Application PublicationNo. 2008/0153088 to Sun et al.; Kobler et al., Identification of an 11Tallele in the polypyrimidine tract of intron 8 of the CFTR gene,Genetics in Medicine 8(2):125-8 (2006); Costa et al., MultiplexAllele-Specific Fluorescent PCR for Haplotyping the IVS8 (TG)m(T)n Locusin the CFTR Gene, Clin. Chem., 54:1564-1567 (2008); Johnson et al., AComparative Study of Five Technologically Diverse CFTR TestingPlatforms, J. Mol. Diagnostics, 9(3) (2007); Pratt et al., Developmentof Genomic Reference Materials for Cystic Fibrosis Genetic Testing, J.Mol. Diagnostics, 11:186-193 (2009).

Amplification of a selected, or target, nucleic acid sequence may becarried out by any suitable means on DNA isolated from biologicalsamples. See generally D. Kwoh and T. Kwoh, 1990. Examples of suitableamplification techniques include, but are not limited to, polymerasechain reaction, ligase chain reaction, strand displacement amplification(see generally Walker et al., 1992a; Walker et al., 1992b),transcription-based amplification (see Kwoh et al., 1989),self-sustained sequence replication (or “35R”) (see Guatelli et al.,1990), the Qβ replicase system (see Lizardi et al., 1988), nucleic acidsequence-based amplification (or “NASBA”) (see Lewis, 1992), the repairchain reaction (or “RCR”) (see Lewis, supra), and boomerang DNAamplification (or “BDA”) (see Lewis, supra). Polymerase chain reactionis currently preferred.

DNA amplification techniques such as the foregoing can involve the useof a probe, a pair of probes, or two pairs of probes which specificallybind to DNA encoding ApoE 4, but do not bind to DNA encoding ApoE 2 orApoE 3 under the same hybridization conditions, and which serve as theprimer or primers for the amplification of the ApoE 4 DNA or a portionthereof in the amplification reaction. Likewise, one may use a probe, apair of probes, or two pairs of probes which specifically bind to DNAencoding ApoE 2, but do not bind to DNA encoding ApoE 3 or ApoE 4 underthe same hybridization conditions, and which serve as the primer orprimers for the amplification of the ApoE 2 DNA or a portion thereof inthe amplification reaction; and one may use a probe, a pair of probes,or two pairs of probes which specifically bind to DNA encoding ApoE 3,but do not bind to DNA encoding ApoE 2 or ApoE 4 under the samehybridization conditions, and which serve as the primer or primers forthe amplification of the ApoE 3 DNA or a portion thereof in theamplification reaction.

Similarly, one may use a probe, a pair of probes, or two pairs of probeswhich specifically bind to DNA encoding a TOMM40 haplotype of interest,but do not bind to other TOMM40 haplotypes under the same hybridizationconditions, and which serve as the primer or primers for theamplification of the TOMM40 DNA or a portion thereof in theamplification reaction.

In general, an oligonucleotide probe which is used to detect DNAencoding ApoE and/or TOMM40 haplotypes is an oligonucleotide probe whichbinds to DNA encoding the haplotype of interest, but does not bind toDNA encoding other haplotypes under the same hybridization conditions.The oligonucleotide probe is labeled with a suitable detectable group,such as those set forth below in connection with antibodies.

Polymerase chain reaction (PCR) may be carried out in accordance withknown techniques. See, e.g., U.S. Pat. Nos. 4,683,195; 4,683,202;4,800,159; and 4,965,188. In general, PCR involves, first, treating anucleic acid sample (e.g., in the presence of a heat stable DNApolymerase) with one oligonucleotide primer for each strand of thespecific sequence to be detected under hybridizing conditions so that anextension product of each primer is synthesized which is complementaryto each nucleic acid strand, with the primers sufficiently complementaryto each strand of the specific sequence to hybridize therewith so thatthe extension product synthesized from each primer, when it is separatedfrom its complement, can serve as a template for synthesis of theextension product of the other primer, and then treating the sampleunder denaturing conditions to separate the primer extension productsfrom their templates if the sequence or sequences to be detected arepresent. These steps are cyclically repeated until the desired degree ofamplification is obtained. Detection of the amplified sequence may becarried out by adding to the reaction product an oligonucleotide probecapable of hybridizing to the reaction product (e.g., an oligonucleotideprobe of the present invention), the probe carrying a detectable label,and then detecting the label in accordance with known techniques, or bydirect visualization on a gel.

When PCR conditions allow for amplification of all ApoE allelic types,the types can be distinguished by hybridization with allelic specificprobe, by restriction endonuclease digestion, by electrophoresis ondenaturing gradient gels, or other techniques. A PCR protocol fordetermining the ApoE genotype is described in Wenham et al. (1991),incorporated by reference herein. Examples of primers effective foramplification and identification of the ApoE isoforms are describedtherein. Primers specific for the ApoE polymorphic region (whether ApoE4, E3 or E2) can be employed. In Wenham, for example, PCR primers areemployed which amplify a 227 bp region of DNA that spans the ApoEpolymorphic sites (codons 112 and 158, which contain nucleotides 3745and 3883). The amplified fragments are then subjected to restrictionendonuclease CfoI which provides different restriction fragments fromthe six possible ApoE genotypes which may be recognizable on anelectrophoresis gel. See also, Hixon et al. (1990); Houlston et al.(1989) Wenham et al. (1991); and Konrula et al. (1990) for additionalmethods, all of which are incorporated by reference herein.

In addition to Alzheimer's disease, there are several other geneticallycomplex diseases and disorders for which the methods of the presentinvention provide advantages over existing analyses. For example, datafrom multiple type 2 diabetes mellitus genetic studies support the viewthat very large clinical case/control series will be necessary toprovide statistical significance for loci defined by genome-wideassociation studies.

7. Active Agents, Compositions and Treatment

As noted above, phylogenetic trees created using the methods detailedherein may also be analyzed based upon responsiveness of the conditionof interest to treatment with an active agent or treatment method ofinterest according to some embodiments, and treatment decisions for asubject or patient may be based upon specific genetic variantsidentified.

Active agents. Active agents include those known for treatment of acondition of interest, and are inclusive of anti-Alzheimer's diseaseactive agents, including, but are not limited to, acetylcholinesteraseinhibitors, NMDA receptor antagonists, and peroxisomeproliferator-activated receptor (PPAR) agonists or modulators, includingbut not limited to those drugs in the thiazolidinedione or glitazarclasses. The active agent could also be a biopharmaceutical product, forexample an antibody (e.g., monoclonal, polyclonal, derivatives of ormodified antibodies such as Domain Antibodies™, Bapineuzumab, etc.),fusion proteins or therapeutic RNA molecules. The active agent couldalso be a combination of any of these products.

Examples of acetylcholinesterase inhibitors include, but are not limitedto, donepezil (commercially available as ARICEPT), galantamine(commercially available as RAZADYNE), and rivastigmine (commerciallyavailable as EXELON) and the pharmaceutically acceptable salts thereof.Additional examples include, but are not limited to, those described inU.S. Pat. Nos. 6,303,633; 5,965,569; 5,595,883; 5,574,046; and 5,171,750(the disclosures of all U.S. Patent references cited herein are to beincorporated by reference herein in their entirety).

Examples of NMDA receptor antagonists include, but are not limited to,memantine (commercially available as AKATINOL, AXURA, EBIXIA/ABIXIA,MEMOX and NAMENDA) and the pharmaceutically acceptable salts thereof.Additional examples include, but are not limited to, those described inU.S. Pat. Nos. 6,956,055; 6,828,462; 6,642,267; 6,432,985; and 5,990126.

Examples of thiazolidinediones include, but are not limited to,rosiglitazone (commercially available as AVANDIA) and thepharmaceutically acceptable salts thereof. Additional examples include,but are not limited to:5-(4-[2-(N-methyl-N-(2-benzothiazolyl)amino)ethoxy]benzyl)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-(2-benzothiazolyl)amino)ethoxy]benzylidene)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-(2-benzoxazolyl)amino)ethoxy]benzyl)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-(2-benzoxazolyl)amino)ethoxy]benzylidene)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-(2-pyrimidinyl)amino)ethoxy]benzyl)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-(2-pyrimidinyl)amino)ethoxy]benzylidene)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-[2-(4,5-dimethylthiazolyl)]amino)ethoxy]benzyl)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-[2-(4,5-dimethylthiazolyl)]amino)ethoxy]benzylidene)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-(2-thiazolyl)amino)ethoxy]benzyl)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-(2-thiazolyl)amino)ethoxy]benzylidene)-2,4-thiazolidinedione;5-[4-(2-(N-methyl-N-(2-(4-phenylthiazolyl))amino)ethoxy)benzyl]-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-(2-(4-phenylthiazolyl))amino)ethoxy]benzylidene)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-[2-(4-phenyl-5-methylthiazolyl)]amino)ethoxy]benzyazolidinedione;5-(4-[2-(N-methyl-N-[2-(4-phenyl-5-methylthiazolyl)]amino)ethoxy]benzylidene)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-[2-(4-methyl-5-phenylthiazolyl)]amino)ethoxy]benzyl)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-[2-(4-methyl-5-phenylthiazolyl)]amino)ethoxy]benzylidene)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-[2-(4-methylthiazolyl)]amino)ethoxy]benzyl)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-[2-(4-methylthiazolyl)]amino)ethoxy]benzylidene)-2,4-thiazolidinedione;5-[4-(2-(N-methyl-N-[2-(5-phenyloxazolyl)]amino)ethoxy)benzyl]-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-[2-(5-phenyloxazolyl)]amino)ethoxy]benzylidene)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-[2-(4,5-dimethyloxazolyl)]amino)ethoxy]benzyl)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-[2-(4,5-dimethyloxazolyl)]amino)ethoxy]benzylidene)-2,4-thiazolidinedione;5-[4-(2-(2-pyrimidinylamino)ethoxy)benzyl]-2,4-thiazolidinedione;5-[4-(2-(2-pyrimidinylamino)ethoxy)benzylidene]-2,4-thiazolidinedione;5-(4-[2-(N-acetyl-N-(2-pyrimidinyl)amino)ethoxy]benzyl)-2,4-thiazolidinedione;5-(4-(2-(N-(2-benzothiazolyl)-N-benzylamino)ethoxy)benzylidene)-2,4-thiazolidinedione;5-(4-(2-(N-(2-benzothiazolyl)-N-benzylamino)ethoxy)benzyl)-2,4-thiazolidinedione;5-(4-[3-(N-methyl-N-(2-benzoxazolyl)amino)propoxy]benzyl)-2,4-thiazolidinedione;5-(4-[3-(N-methyl-N-(2-benzoxazolyl)amino)propoxy]benzylidene)-2,4-thiazolidinedione;5-(4-[2-(N-methyl-N-(2-pyridyl)amino)ethoxy]benzyl)-2,4-thiazolidinedione;5-(4[2-(N-methyl-N-(2-pyridyl)amino)ethoxy]benzylidene)-2,4-thiazolidinedione;5-(4-[4-(N-methyl-N-(2-benzoxazolyl)amino)butoxy]benzylidene)-2,4-thiazolidinedione;5-(4-[4-(N-methyl-N-(2-benzoxazolyl)amino)butoxy]benzyl)-2,4-thiazolidinedione;5-(4-[2-(N-(2-benzoxazolyl)amino)ethoxy]benzylidene)2,4-thiazolidinedione;5-(4-[2-(N-(2-benzoxazolyl)amino)ethoxy]benzyl)-2,4-thiazolidinedione;5-(4-[2-(N-isopropyl-N-(2-benzoxazolyl)amino)ethoxy]benzyl)-2,4-thiazolidinedione,and pharmaceutically acceptable salts thereof. See, e.g., U.S. Pat. No.5,002,953.

The active agents disclosed herein can, as noted above, be prepared inthe form of their pharmaceutically acceptable salts. Pharmaceuticallyacceptable salts are salts that retain the desired biological activityof the parent compound and do not impart undesired toxicologicaleffects. Examples of such salts are (a) acid addition salts formed withinorganic acids, for example hydrochloric acid, hydrobromic acid,sulfuric acid, phosphoric acid, nitric acid and the like; and saltsformed with organic acids such as, for example, acetic acid, oxalicacid, tartaric acid, succinic acid, maleic acid, fumaric acid, gluconicacid, citric acid, malic acid, ascorbic acid, benzoic acid, tannic acid,palmitic acid, alginic acid, polyglutamic acid, naphthalenesulfonicacid, methanesulfonic acid, p-toluenesulfonic acid,naphthalenedisulfonic acid, polygalacturonic acid, and the like; (b)salts formed from elemental anions such as chlorine, bromine, andiodine, and (c) salts derived from bases, such as ammonium salts, alkalimetal salts such as those of sodium and potassium, alkaline earth metalsalts such as those of calcium and magnesium, and salts with organicbases such as dicyclohexylamine and N-methyl-D-glucamine.

Active agents can be administered as prodrugs. “Prodrugs” as used hereinrefers to those prodrugs of the compounds of the present invention whichare, within the scope of sound medical judgment, suitable for use incontact with the tissues of humans and lower animals without unduetoxicity, irritation, allergic response and the like, commensurate witha reasonable risk/benefit ratio, and effective for their intended use,as well as the zwitterionic forms, where possible, of the compounds ofthe invention. The term “prodrug” refers to compounds that are rapidlytransformed in vivo to yield the parent compound of the above formulae,for example, by hydrolysis in blood. A thorough discussion is providedin T. Higuchi and V. Stella, Prodrugs as Novel delivery Systems, Vol. 14of the A.C.S. Symposium Series and in Edward B. Roche, ed.,Bioreversible Carriers in Drug Design, American PharmaceuticalAssociation and Pergamon Press, 1987, both of which are incorporated byreference herein. See also U.S. Pat. No. 6,680,299 Examples include aprodrug that is metabolized in vivo by a subject to an active drughaving an activity of active compounds as described herein, wherein theprodrug is an ester of an alcohol or carboxylic acid group, if such agroup is present in the compound; an acetal or ketal of an alcoholgroup, if such a group is present in the compound; an N-Mannich base oran imine of an amine group, if such a group is present in the compound;or a Schiff base, oxime, acetal, enol ester, oxazolidine, orthiazolidine of a carbonyl group, if such a group is present in thecompound, such as described in U.S. Pat. Nos. 6,680,324 and 6,680,322.

Compositions. The active agents described above may be formulated foradministration in a pharmaceutical carrier in accordance with knowntechniques. See, e.g., Remington, The Science And Practice of Pharmacy(9^(th) Ed. 1995). In the manufacture of a pharmaceutical formulationaccording to the invention, the active compound (including thephysiologically acceptable salts thereof) is typically admixed with,inter alia, an acceptable carrier. The carrier must, of course, beacceptable in the sense of being compatible with any other ingredientsin the formulation and must not be deleterious to the patient. Thecarrier may be a solid or a liquid, or both, and is preferablyformulated with the compound as a unit-dose formulation, for example, atablet, which may contain from 0.01 or 0.5% to 95% or 99% by weight ofthe active compound. One or more active compounds may be incorporated inthe formulations of the invention, which may be prepared by any of thewell known techniques of pharmacy comprising admixing the components,optionally including one or more accessory ingredients.

The formulations of the invention include those suitable for oral,rectal, topical, buccal (e.g., sub-lingual), vaginal, parenteral (e.g.,subcutaneous, intramuscular, intradermal, or intravenous), topical(i.e., both skin and mucosal surfaces, including airway surfaces) andtransdermal administration, although the most suitable route in anygiven case will depend on the nature and severity of the condition beingtreated and on the nature of the particular active compound which isbeing used.

Formulations suitable for oral administration may be presented indiscrete units, such as capsules, cachets, lozenges, or tablets, eachcontaining a predetermined amount of the active compound; as a powder orgranules; as a solution or a suspension in an aqueous or non-aqueousliquid; or as an oil-in-water or water-in-oil emulsion. Suchformulations may be prepared by any suitable method of pharmacy whichincludes the step of bringing into association the active compound and asuitable carrier (which may contain one or more accessory ingredients asnoted above). In general, the formulations of the invention are preparedby uniformly and intimately admixing the active compound with a liquidor finely divided solid carrier, or both, and then, if necessary,shaping the resulting mixture. For example, a tablet may be prepared bycompressing or molding a powder or granules containing the activecompound, optionally with one or more accessory ingredients. Compressedtablets may be prepared by compressing, in a suitable machine, thecompound in a free-flowing form, such as a powder or granules optionallymixed with a binder, lubricant, inert diluent, and/or surfaceactive/dispersing agent(s). Molded tablets may be made by molding, in asuitable machine, the powdered compound moistened with an inert liquidbinder.

Formulations suitable for buccal (sub-lingual) administration includelozenges comprising the active compound in a flavored base, usuallysucrose and acacia or tragacanth; and pastilles comprising the compoundin an inert base such as gelatin and glycerin or sucrose and acacia.

Formulations of the present invention suitable for parenteraladministration comprise sterile aqueous and non-aqueous injectionsolutions of the active compound(s), which preparations are preferablyisotonic with the blood of the intended recipient. These preparationsmay contain anti-oxidants, buffers, bacteriostats and solutes whichrender the formulation isotonic with the blood of the intendedrecipient. Aqueous and non-aqueous sterile suspensions may includesuspending agents and thickening agents. The formulations may bepresented in unit\dose or multi-dose containers, for example sealedampoules and vials, and may be stored in a freeze-dried (lyophilized)condition requiring only the addition of the sterile liquid carrier, forexample, saline or water-for-injection immediately prior to use.Extemporaneous injection solutions and suspensions may be prepared fromsterile powders, granules and tablets of the kind previously described.For example, in one aspect of the present invention, there is providedan injectable, stable, sterile composition comprising an activeagent(s), or a salt thereof, in a unit dosage form in a sealedcontainer. The compound or salt is provided in the form of alyophilizate which is capable of being reconstituted with a suitablepharmaceutically acceptable carrier to form a liquid compositionsuitable for injection thereof into a subject. The unit dosage formtypically comprises from about 10 mg to about 10 grams of the compoundor salt. When the compound or salt is substantially water-insoluble, asufficient amount of emulsifying agent which is physiologicallyacceptable may be employed in sufficient quantity to emulsify thecompound or salt in an aqueous carrier. One such useful emulsifyingagent is phosphatidyl choline.

Formulations suitable for topical application to the skin preferablytake the form of an ointment, cream, lotion, paste, gel, spray, aerosol,or oil. Carriers which may be used include petroleum jelly, lanoline,polyethylene glycols, alcohols, transdermal enhancers, and combinationsof two or more thereof.

Formulations suitable for transdermal administration may be presented asdiscrete patches adapted to remain in intimate contact with theepidermis of the recipient for a prolonged period of time. Formulationssuitable for transdermal administration may also be delivered byiontophoresis (see, for example, Pharmaceutical Research 3 (6):318(1986)) and typically take the form of an optionally buffered aqueoussolution of the active compound. Suitable formulations comprise citrateor bis\tris buffer (pH 6) or ethanol/water and contain from 0.1 to 0.2Mactive ingredient.

In addition to active compound(s), the pharmaceutical compositions maycontain other additives, such as pH-adjusting additives. In particular,useful pH-adjusting agents include acids, such as hydrochloric acid,bases or buffers, such as sodium lactate, sodium acetate, sodiumphosphate, sodium citrate, sodium borate, or sodium gluconate. Further,the compositions may contain microbial preservatives. Useful microbialpreservatives include methylparaben, propylparaben, and benzyl alcohol.The microbial preservative is typically employed when the formulation isplaced in a vial designed for multidose use. Of course, as indicated,the pharmaceutical compositions of the present invention may belyophilized using techniques well known in the art.

Dosage. The therapeutically effective dosage of any specific activeagent, the use of which is in the scope of present invention, will varysomewhat from compound to compound, and patient to patient, and willdepend upon the condition of the patient and the route of delivery. Fororal administration, a total daily dosage of from 1, 2 or 3 mg, up to30, 40 or 50 mg, may be used, given as a single daily dose or dividedinto two or three daily doses.

Treatment. Genetic variants as described herein or discovered using themethods as taught herein may be used to determine the course oftreatment of a patient afflicted with a condition (e.g., a conditionassociated with ApoE and/or TOMM40), by, e.g., determining which activeagent and/or course of treatment to administer based upon the presenceor absence of the genetic variant or variants. The presence or absenceof the genetic variants may indicate efficacy of an active agent and/orcourse of treatment for the patient, predict age of onset for acondition, indicate preferred dose regimens, etc. A genetic profile maybe generated for a patient, and the profile consulted to determinewhether the patient is among a group of patients that are likely to beresponsive to a particular active agent.

Instructions for use may be packaged with or otherwise associated withan active agent indicating recommendations for treatment, time totreatment, dose regimens, etc., based upon the presence or absence ofthe genetic variants.

8. Methods of Determining a Prediction of Disease Risk or a Prognosis

To determine a prediction of disease risk for a non-symptomaticindividual or a prognosis (the prospect of affliction or disease courseas anticipated from the usual course of disease or peculiarities of thecase) according to some embodiments of the present invention, diagnosticdata, including the patient's diagnosis or medical history and geneticdata, such as the patient's genotype (e.g., ApoE and/or TOMM40genotype), may be processed to provide therapeutic options and outcomepredictions. Processing may include obtaining a “patient profile” suchas the collection of a patient's medical history including age andgender, genotyping of the loci of interest (e.g., using appropriatelydesigned primers and using an RT-PCR or PCR amplification step and/orphenotyping, e.g., using an antibody-mediated method or enzymatic test),and statistical or other analyses that converts this raw data into aprognosis. The prognosis may include a prediction of a patient's age ofdisease onset, response to drug therapy, time to treatment, treatmentefficacy, etc. In some embodiments, the prognosis may include the use ofa computer software program to analyze patient data and run statisticalcross-checks against relational databases in order to convert thepatient data or profile to a prognosis.

A “patient profile” includes data and/or materials pertaining to thepatient for whom the predictive and/or prognostic analysis is beingperformed. Data may include information on the patient's diagnosis, age,gender, and/or genotype. The patient profile may also include materialsfrom the patient such as blood, serum protein samples, cerebrospinalfluid, or purified RNA or DNA.

9. Genotype Stratification in Clinical Trials

Detection of a genotype taught herein or as determined with the methodsherein can be used in conducting a clinical trial in like manner asother genotype information is used to conduct a clinical trial, such asdescribed in, e.g., U.S. Pat. Nos. 6,573,049 6,368,797 and 6,291,175.

In some embodiments, such methods advantageously stratify or permit therefinement of the patient population (e.g., by division of thepopulation into one or more subgroups) so that advantages of particulartreatment regimens can be more accurately detected, particularly withrespect to particular sub-populations of patients with particulargenotypes. In some embodiments, such methods comprise administering atest active agent or therapy to a plurality of subjects (a control orplacebo therapy typically being administered to a separate but similarlycharacterized plurality of subjects) and detecting the presence orabsence of a genotype (e.g., ApoE and/or TOMM40) as described above inthe plurality of subjects. The genotype may be detected before, after,or concurrently with the step of administering the test therapy. Theinfluence of one or more detected alleles on the test therapy can thenbe determined on any suitable parameter or potential treatment outcomeor consequence, including, but not limited to, the efficacy of saidtherapy, lack of side effects of the therapy, etc.

A clinical trial can be set up to test the efficacy of test compounds totreat any number of diseases for which a particular genotype has beendetermined to be associated, for subjects who are diagnosed with thedisease or are at risk for developing the disease. If subjects aregenotyped after the completion of a clinical trial, the analyses maystill be aimed at determining a relationship between a treatment for adisease and the allele to be assessed for efficacy. Alternatively, if asymptomatic or asymptomatic subject has not yet been diagnosed with thedisease but has been determined to be at risk of developing the disease,a similar clinical trial to the clinical trial described above may becarried out.

The underlying biological mechanisms may also be considered whendesigning the treatment groups. For example, the ApoE 4 (1-272) fragmentbinds to mitochondria, decreases mitochondrial cellular dynamics anddecreases synaptogenesis more than ApoE 3 (1-272). Rosiglitazone, a drugcandidate for the treatment of Alzheimer's disease, increasesmitogenesis and increases synaptogenesis—opposing the effects of ApoEfragment binding—for ApoE 3 greater than with ApoE 4. Therefore, thedrug or treatment candidate (e.g., rosiglitazone) may be selected basedupon an underlying mechanism of action as it relates to the geneticmarkers used for the stratifications (e.g., ApoE 2, E 3, E 4 and/orTOMM40 variants).

Assessment of the efficacy of a drug chosen for the trial may includemonitoring the subject over a period of time and analyzing the delay ofonset of the disease and the intensity of the disease at the time ofonset, as well as measuring the onset of symptoms which are associatedwith the disease. A drug that, in a clinical trial, eliminates or delaysthe onset of the disease, or reduces the symptoms of the disease may bea beneficial drug to use in patients diagnosed with the disease or atrisk of developing the disease. Test compounds which may be used in suchtrials include the agents as described above, including those previouslyapproved for clinical use and new compounds not yet approved for use, orapproved for treating a particular disease. Thus, in some embodimentsthe clinical trial may include the optimization of drug administration,including dosage, timing of administration, toxicities or side effects,route of administration, and efficacy of the treatment.

10. Kits Useful for the Detection of Genotype Variants at Loci ofInterest

Kits for determining if a subject is at increased risk of developing adisease, developing a disease at an earlier age of onset, and/or acandidate for a particular treatment, where the disease is associatedwith ApoE and/or TOMM40 (e.g., late onset Alzheimer's disease), areprovided herein. The kits include at least one reagent specific fordetecting for the presence or absence of an ApoE and/or TOMM40 variantas described herein, and may include instructions to aid in determiningwhether the subject is at increased risk of developing the disease. Thekit may optionally include a nucleic acid for detection of an ApoE gene(e.g., ApoE 2, ApoE 3 and/or ApoE 4) or instructions for isoelectricfocusing methods for detecting the ApoE genotype; and/or a nucleic acidfor detection of a TOMM40 variant as described herein. In someembodiments, the kit may optionally include one or more antibodies whichbinds to ApoE 2, ApoE 3, ApoE 4, or to isoforms of TOMM40. The test kitmay be packaged in any suitable manner, typically with all elements in asingle container along with a sheet of printed instructions for carryingout the test.

In some embodiments, the kit may optionally contain buffers, enzymes,and reagents for amplifying the genomic nucleic acids viaprimer-directed amplification. The kit also may include one or moredevices for detecting the presence or absence of particular haplotypesin the amplified nucleic acid. Such devices may include one or moreprobes that hybridize to a haplotype nucleic acid, which may be attachedto a bio-chip or microarray device, such as any of those described inU.S. Pat. No. 6,355,429. The bio-chip or microarray device optionallyhas at least one capture probe attached to a surface that can hybridizeto a haplotype sequence. In preferred embodiments, the bio-chip ormicroarray contains multiple probes, and most preferably contains atleast one probe for a haplotype sequence which, if present, would beamplified by a set of flanking primers. For example, if five pairs offlanking primers are used for amplification, the device would contain atleast one haplotype probe for each amplified product, or at least fiveprobes. The kit also preferably includes instructions for using thecomponents of the kit.

The present invention is explained in greater detail in the followingnon-limiting Examples.

Example 1: Construction of Phylogenetic Trees

All of the known genome-wide scanning studies demonstrate extremelysignificant p values around the apolipoproteinC1 [ApoC1]locus. (Mahleyet al., Proc. Natl. Acad. Sci. USA 103: 5644-51 (2006), Coon et al., J.Clin. Psychiatry 68: 613-8 (2007); Li et al., Arch. Neurol. 65: 45-53(2007)). Of equal importance is that each series identified a “favored”borderline significant candidate gene outside of the ApoE linkagedisequilibrium area, but these favored candidate genes were different ineach study. TOMM40 is near ApoC1 and in linkage disequilibrium withApoE. Interactions between ApoE 3 or ApoE 4 and different TOMM40isoforms are believed to be associated with increased or decreased riskof developing Alzheimer's disease within an earlier age range. Age ofonset curves for Apo 4/4, 3/4, 3/3, 2/4, and 2/3 genotypes is shown inFIG. 2, indicating a range of risk for earlier development of thedisease, depending upon the ApoE profile. ApoE alone does not appear toexplain all of the data in these age of onset curves, however.

Various methods for polymorphic profiling of Alzheimer's disease riskassociated with the different ApoE alleles have been proposed (see,e.g., U.S. Application of Cox et al., No. 20060228728; U.S. Applicationof Li and Grupe, No. 20080051318). A phylogenetic approach to the ApoE 4puzzle is demonstrated herein.

Biological samples, DNA isolation, amplification of loci of interest. Atotal of 340 subjects included 135 Alzheimer's disease cases and 99age-matched controls in Group A as well as 57 cases and 49 controls inGroup B. All subjects carried the ApoE genotypes previously associatedwith higher risk for earlier disease onset (i.e. 3/3, 3/4, or 4/4).Biological samples containing DNA were collected from all subjects.Genomic DNA was then isolated according to conventional methods forsequencing of genetic loci on Chromosome 19.

FIG. 3 shows the genetic regions on Chromosome 19 targeted for studyusing genome-wide scanning data from multiple reports. The region isencompassed within GenBank reference sequence AF050154. Software wasused to generate multiple sequence alignments for variant loci (e.g.,ClustalW2, European Bioinformatics Institute). Subsequently, themultiple sequence alignments were analyzed using software for developingphylogenetic trees (e.g., MEGA version 2.1, Center for EvolutionaryFunctional Genomics, TREEVOLVE, Department of Zoology, University ofOxford, or parsimony-based construction software such as PAUP, SinauerAssociates). Statistical analyses may be performed with, e.g., GeneticData Analysis (GDA: Software for the Analysis of Discrete Genetic Data,The Bioinformatics Research Center of North Carolina State University).The results of Region B analysis are demonstrated in the phylogenetictree of FIG. 4.

Each piece of data in FIG. 4 represents an observed sequence variant.These variants may be nucleotide substitutions, insertions, deletions,or microsatellites and may or may not result in detectable differencesin gene expression or protein function. Each node represents a variant(or a number of variants) that occurs on more than one chromosome.Adjacent nodes define the boundaries of sequences that are in cis, andtherefore more likely to be inherited as a unit, in the region ofinterest on a subject's chromosome. Nodes that precede the greatestnumber of subsequent nodes represent evolutionarily ancestral variantsfrom which genetic divergence has occurred over time.

The presence of haplotypes or sequence variants corresponding withregions of the tree representing subjects with substantially higherincidence of Alzheimer's disease (i.e., higher ratios of subjectsaffected with the disease to unaffected control subjects) would meanthat the individual subject is also at increased risk. Conversely,substantially lower ratios correspond to reduced risk of developingAlzheimer's disease.

TOMM40 interacts with ApoE directly in regulation of mitochondrialprotein import, and a present hypothesis is that the expression of aparticular TOMM40 variant(s) exacerbates the relatively moderate riskfor Alzheimer's disease associated with the dose-dependent presence ofthe ApoE 3 allele. Such a TOMM40 variant is discovered within Region Busing the methods of the present invention.

Testing new drugs on human subjects carries immense risk (see Kenter andCohen, Lancet, 368: 1387-91 (2006)). The use of phylogenetic trees toanticipate individual response to a drug or treatment of interest haspotential to alleviate that risk significantly. Preliminary studiesindicated that rosiglitazone (Avandia) may have genetic-profile specificefficacy in the treatment of Alzheimer's disease (see Risner et al., ThePharmacogenomics Journal 6, 246-254 (2006); Brodbeck et al., Proc. Nat.Acad. Sci. 105, 1343-6 (2008)). Phase II clinical trial data indicatethat Alzheimer's disease patients without an ApoE 4 allele respondedbetter to rosiglitazone than patients who carry either 1 or 2 ApoE 4alleles (data not shown). This supports the hypothesis that variantsidentified with the methods taught herein may be used to anticipateindividual response to treatment based upon genotype.

Example 2: Identification of TOMM40 Variants of Interest

174 sequences (2 from each of 87 subjects) were aligned using theCLUSTAL X program (version 2.0.10, Larkin et al., Clustal W and ClustalX version 2.0. Bioinformatics, 23:2947-2948 (2007)). The multiplesequence alignment was used to construct a phylogenetic tree using aneighbor joining algorithm (Saitou and Nei, The neighbor-joining method:a new method for reconstructing phylogenetic trees. Mol. Evol. Biol.,4:406-425 (1987)) as implemented on the European BioinformaticsInstitute (EBI) website.

The resulting phylogenetic tree has a structure of two major groups (A,B) at the first divergence. The ApoE genotype frequencies for thesegroups are tabulated and shown in FIG. 5. It is clear that group Bcontains subject-haplotypes of primarily ε3/ε3 and ε3/ε4 ApoE genotypesand almost no ε4/ε4. Group A contains almost all of the subjecthaplotypes with the ε4/ε4 genotype.

The list of polymorphisms generated by the SNP discovery platform(Polymorphic) were used to identify specific variants in the TOMM40 genethat separated the data into the two groups. A likelihood ratio test wasused to identify significant variants with a p value less than 0.005.

The list of variants is summarized in Table 1. In the table, the term“deletion” is used when the minor allele is a deletion of a nucleotide,and the term “insertion” is used when the minor allele is an addition ofa nucleotide. The term “deletion/insertion polymorphism” is used whenthere are more than two possible forms and the minor allele is notapparent. For example, for the poly-T polymorphisms, there are multiplelength polymorphisms observed. The second column of the table providesinformation on the identities of the specific alleles associated withthe variant that divide the sequences into the two groups. For example,T>A indicates that the T allele segregates sequences into group “A” onthe phylogenetic tree. When two alleles are listed, e.g. G>B; A>A, eachallele uniquely segregates the sequence data into the two groups, whilewhen a single allele is listed it is associated with the predominateseparation of the data, and the remaining allele does not uniquelyseparate the data into a homogenous group, but instead a mixture of bothgroups.

TABLE 1 TOMM40 variants associated with groups on phylogenetic tree thatdistribute by ApoE genotype. Genomic Location UCSC (NCBI Classi- VariantAllele > tree group Build 36.3) Function fication 50,092,565 T > A50,092,565 Intron 6 single 50,092,587 T > A 50,092,587 Intron 6 singlers8106922 G > B; A > A 50,093,506 Intron 6 single rs34896370, T12_C_T15,50,093,609 Intron 6 complex rs55821237, T12_C_T16, rs56290633 T13_C_T14,T13_C_T15, T13_C_T16 > A; T14_C_T14, T14_C_T15 > B rs34878901 T > B; C >A 50,094,317 Intron 6 single rs35568738 C > B 50,094,558 Intron 6 singlers10602329 T16, 17, 18 > A 50,094,716 Intron 6 insertion/ T14, 15 > Bdeletion 50,094,733 −>A 50,094,733 Intron 7 insertion rs10524523 T12,14, 15, 50,094,889 Intron 6 insertion/ 16, 17 > B deletion T21, 22, 26,27, 28, 29, 30, 31, 32, 33, 34 35, 36 > A rs1160985 T > B; C > A50,095,252 Intron 6 single 50,095,506 T > A 50,095,506 Intron 6 singlers760136 A > A; G > B 50,095,698 Intron 6 single rs1160984 T > B50,095,764 Intron 6 single rs741780 C > B; T > A 50,096,271 Intron 8single rs405697 A > A 50,096,531 Intron 9 single 50,096,647 −>A50,096,647 Intron 9 deletion (DIP3) 50,096,697 C > A 50,096,697 Intron 9single rs1038025 C > B; T > A 50,096,812 Intron 9 single rs1038026 G >B; A > A 50,096,902 Intron 9 single rs1305062 C > B; G > A 50,097,361Intron 9 single rs34215622 G > B; −>A 50,098,378 Exon 10 insertionrs10119 A > A 50,098,513 Exon 10 single rs7259620 G > A; A > B50,099,628 unknown single

Example 3: Two Distinct Forms of ApoE 3: Those Linked to TOMM40Haplotypes that Increase Risk and Decrease Age of Onset, and Those thatDecrease Risk

The association of apolipoprotein E (ApoE) genotypes, particularly ApoEε4 (ApoE 4), with the risk and age of onset of Alzheimer's disease (AD)remains the most confirmed genetic association for any complex disease.Estimates of the heritability of ApoE 4 for late onset AD range from 58%to 79%, and the population attributable risk due to the ApoE 4 allele isbetween 20% and 70%. These estimates suggest that other genetic variantsand/or interactions between variants incur additional disease risk andmodify age of onset distributions.

Genome wide scan association results for AD have consistently reproducedthe extraordinary association of the LD region containing ApoE. TOMM40,the protein translocase of the outer mitochondrial membrane, is in highLD with ApoE, and codes for the membrane channel through whichcytoplasmic peptides and proteins traverse in order to synthesize newmitochondria. Our objectives were to identify additional haplotypeswithin the LD region that increase the estimates of heritability.

Methods: We examined the LD region containing both ApoE and TOMM40 usingdeep (10×) primary sequencing in AD patients and controls. We performedphylogenetic analyses of the LD region covering TOMM40 and ApoE in 66patients and 66 age-matched controls with respect to risk and age ofonset distribution.

Conclusion: We found that unique and distinct inherited families ofdifferent TOMM40 variants are located on the same genomic interval asApoE 3, but not on the ApoE 4-containing genomic interval, and caneither increase or decrease the age of risk distribution of AD.Therefore, the genetic inheritance of these TOMM40 variants areindependent of the inheritance of ApoE 4, effectively providing adifferentiation of two distinct forms of ApoE 3: those linked to TOMM40haplotypes that increase risk and decrease age of onset, and those thatdecrease risk. These data increase the accuracy of genetic age of onsetrisk, dependent on age, ApoE and

TOMM40 genotypes and provide the opportunity to define high risk of ADover the next 5-7 years, versus lower risk of AD.

Example 4: Analysis of Three Identified TOMM40 DIP Variants

Three of the TOMM40 variants identified in this application aredeletion/insertion polymorphisms (DIPs) located in intron 6 or intron 9.These DIPs are identified as rs10524523 and rs10602329 in the NationalCenter for Biotechnology Information dbSNP database, and a previouslyundescribed polymorphism, designated as DIP3. These polymorphisms arelocated at chr19:50,094,889, chr19:50,094,731, and chr19:50,096,647,respectively, according to NCBI build 36. This invention describes theidentification of these DIPs using phylogenetic analysis of the TOMM40gene, specifically of a 10 Kb fragment of the gene, and that the DIPsare associated with different evolutionary groups determined byphylogenetic analysis. This invention further discloses the utility ofthese DIPs for (1) determining risk of a healthy person for developingAlzheimer's disease in the future, and (2) for predicting age of onsetof AD within an approximately 8 year time-frame.

The three DIP polymorphisms characterized herein correspond to differentlengths of DIP poly-T repeats in the TOMM40 gene. The association of DIPpoly-T variants with disease risk has precedence. For example a poly-Tvariant in intron 8 of the cystic fibrosis transmembrane conductanceregulator (CFTR) gene is associated with skipping of exon 9 and thedevelopment of cystic fibrosis (Groman et al., Am J Hum Genet74(1):176-9 (2004)). Herein is disclosed: (1) use of the novelmethod—phylogenetic association analysis (described above)—to identifyDIPs that are predictive of disease risk and/or differences in age ofdisease onset, (2) the identity of three specific DIPs associated withdifferences in AD age of onset and AD risk, (3) the use of these SNPsindividually, together, or with other sequence variants in TOMM40 orApoE to diagnose disease or predict or determine disease characteristicssuch as age of disease onset, disease prognosis, disease sub-types,disease severity, and also to analyze or determine the response todrugs.

Phylogenetic analysis reveals the distribution of rs10524523 andrs10602329 DIPs into two different clades. This analysis reveals thatshorter poly-T lengths at these loci map to thephylogenetically-identified clades in group B, the group that alsocomprises higher percentages of ApoE ε3/ε3 genotype subjects,effectively few (0%) ApoE ε4/ε4 subjects and lower case/control ratios(i.e., AD disease risk) (FIG. 5). The association between DIP length andphylogenetic group is statistically significant (p<0.0001) by thelikelihood ratio test or Pearson Chi-square test.

Due to the genomic architecture, the high linkage disequilibrium and theevolutionary relationships as indicated the phylogenetic analysis,between the two genes, and the putative physical interaction between thetwo gene products, the influence of TOMM40 genotype is likely to extendto other diseases that are influenced by ApoE genotype. These diseasesinclude, but are not limited to, Parkinson's disease, Multiplesclerosis, cardiovascular disease, dyslipidemia, recovery from traumaticbrain injury, recovery from brain ischemic events, response toanaesthetics, and response to drugs used to treat AD and the diseaseslisted here.

These polymorphisms could also be used in drug discovery efforts for thescreening of compounds useful for treating diseases influenced byvariations in TOMM40 or ApoE protein or gene variants.

In addition, the variants may influence or determine therapies based onspecifically targeted biopharmaceuticals as exemplified by monoclonalantibodies and siRNA molecules.

The DIP polymorphisms in TOMM40 that are disclosed herein can beidentified from an individual's DNA sample using many differentmolecular nucleotide analysis methodologies, including, but not limitedto, DNA sequencing with the primers denoted in Table 4 listed below.

Example 5: Longer Poly-T Tracts at rs10524523 are SignificantlyCorrelated with Earlier Age of Onset of LOAD

Phylogenetic analysis has been used to identify genomic relationshipsbetween low frequency genetic variants and to cluster evolutionarilyrelated haplotypes (Hahn et al. Population genetic and phylogeneticevidence for positive selection on regulatory mutations at the factorVII locus in humans. Genetics 167, 867-77 (2004)). This methodology wasemployed to explore the ApoE-TOMM40 LD block for the existence of novelrisk determinants for LOAD. In an exploratory study, 23 Kb of DNAcontaining the TOMM40 and ApoE genes were amplified and sequenced, andphase-resolved haplotypes were determined, for 72 LOAD cases and 60age-matched controls (Li et al. Candidate single-nucleotidepolymorphisms from a genomewide association study of Alzheimer disease.Arch Neurol 65, 45-53 (2008)). It was possible to construct a distinctphylogenetic tree for 10 Kb, encoding exons 2-10, of this region. Twoclades (A and B) were distinguished with strong bootstrap support (98%,1000 replicates). There was a significant difference in the distributionof the ApoE genotypes between the two clades of TOMM40 haplotypes onthis phylogenetic tree, suggesting that this region could befunctionally significant. Both clades contained subjects with the ε3/ε3genotype, but 98% of all clade B haplotypes occurred in cis with theApoE ε3 allele (P=1.2×10¹⁸, Fisher's exact test, two-tailed).

The phylogenetic structure of this 10 Kb region of TOMM40, the ApoEε3-specific inheritance of particular haplotypes, and the identify ofthe clade-specific polymorphisms were subsequently confirmed in twoindependent LOAD case/control cohorts, including one cohort withautopsy-confirmed AD status and age of disease onset data. Theassociation between the two clades and disease risk and age of diseaseonset, where the data was available, was also explored for these twocohorts. The first cohort (AS) comprised AD cases (n=74) and controls(n=31) ascertained at the Arizona Alzheimer's Disease Research Center(ADRC). The second cohort (DS) was assembled at the Duke Bryan ADRC andcomprised ApoE ε3/ε4 subjects only (40 autopsy-confirmed cases withknown age of disease onset and 33 controls) (Table 2). Although DNAsequencing was successful for a subset of the DS cohort who had diseaseonset from 50 to 68 years of age, association analyses were limited to asubset of patients who developed AD after the age of 60.

TABLE 2 Cohort compositions. The number of cases and controls, mean age,and percentage that are female are shown for each series. Mean age isgiven as age-at-diagnosis of AD for cases and age-at-examination forcontrols. The standard deviation from the mean is given in parenthesis.n % Females Con- Mean Age (SD) Con- Series Cases trols Cases ControlsCases trols AS 74 31 81.7 (8.01) 77 (8.93) 56.3 46.7 DS 40 33 69.3 (8.3)71.9 (7.5) 70 66.7

A phylogenetic tree of similar structure to that generated in theexploratory study was developed with strong bootstrap support (97%, 1000replicates) for the AS cohort. ApoE ε4/ε4 subjects occurred only inclade A (98% separation between groups, P=2.0×10⁴ Fisher's exact test,two-tailed), while the remaining ApoE genotypes were distributed betweenclades A and B (FIG. 6). That is, ApoE ε4 was always in LD with clade Avariants whereas ApoE ε3 occurred in both clade A and clade Bhaplotypes. Examination of the distribution of the few ApoE ε2/ε4subjects on the phylogenetic tree suggests that ApoE ε2-TOMM40haplotypes share a similar evolutionary history with ApoEε3-TOMM40haplotypes (data not shown). To verify the phylogenetic structure usinga separate method, and to ensure that recombination within the geneticinterval did not confound the phylogenetic tree structure developed forthe AS cohort, haplotype networks were also constructed usingstatistical parsimony (TCS version 1.21 (Clement et al. TCS: a computerprogram to estimate gene genealogies. Mol Ecol 9, 1657-9 (2000))). Themajor subject-haplotype clusters derived from the two methods (maximumparsimony and TCS) were congruent.

Clade A was more frequently associated with AD cases than was clade B(OR=1.44, 95% CI=0.76-2.70). ApoE ε3/ε4 heterozygotes (n=36) wereanalyzed to estimate disease risk associated with clade A haplotypeswhile controlling for the effect of ApoE ε4. There was a trend to higherincidence of LOAD for the subset that was homozygous for TOMM40 clade Arelative to the subset that was heterozygous for clade A and clade B(OR=1.36, 95% CI=0.40-4.61) and thus it was postulated that at leastsome of the TOMM40 variants which define clade A confer ApoEε4-independent risk of LOAD.

Analysis of the AS cohort sequence data identified 39 polymorphic sitesin the TOMM40 10 Kb region, of which there were 30 parsimony-informativesites (at least two different nucleotides, each represented in at leasttwo sequences). Of the 30 parsimony-informative sites, 18 had a minorallele frequency (MAF)>0.10 and six SNPs were outside the boundary ofthe TOMM40 gene. 10 SNPs occurred exclusively in the context of ApoE ε3(P=6.07×10⁻⁵°, Fisher's exact test, two-tailed, n=210) and were neverobserved in ApoE ε4/ε4 homozygous subjects (n=16). The majority of the63-specific TOMM40 variants were located in intronic regions.

FIGS. 7A and 7B illustrate the 10 SNPs and 6 insertion/deletionpolymorphisms that distinguish TOMM40 clades A and B (at P<0.001) forthe ApoE ε3/63 subjects from the AS cohort. These polymorphisms weretested individually and as haplotypes for association with LOAD risk(Table 3). The odds ratios for disease risk for each clade B allele, inall cases the minor allele, suggest that the clade B alleles areprotective of AD risk in the AS cohort, however, in each case theassociation narrowly missed significance. To account for the effect ofApoE ε4 on the odds ratios reported in Table 3, a balanced set of 48 ADcases and 48 AD controls was constructed by selecting sequences atrandom from ApoE ε3/ε4 subjects from the pooled AS and DS cohorts.Single SNPs again were not significantly associated with LOAD in thisbalanced data set. However, the minor alleles of four of the SNPs(rs8106922, rs1160985, rs760136, rs741780) that distinguish TOMM40 cladeB were assayed previously in three LOAD case/control genome-wideassociation studies and were found to be protective of disease risk(OR<1 in each case), which is consistent with the trend observed in ourstudy (Abraham et al. A genome-wide association study for late-onsetAlzheimer's disease using DNA pooling. BMC Med Genomics 1, 44 (2008);Carrasquillo et al. Genetic variation in PCDH11X is associated withsusceptibility to late-onset Alzheimer's disease. Nat Genet 41, 192-198(2009); Takei et al. Genetic association study on in and around the ApoEin late-onset Alzheimer disease in Japanese. Genomics 93, 441-448(2009)).

TABLE 3 Descriptive statistics and allelic and genotypic associationresults for the individual SNPs. clade B MAF MAF MAF LOAD LOAD ControlControl SNP ID Position Allele allele (all) (cases) (controls) (M) (m)(M) (m) All ApoE genotypes rs1038025 50096812 T/c c 0.31 0.28 0.37 10641 39 23 rs1038026 50096902 A/g g 0.31 0.28 0.37 106 41 39 23 rs116098550095252 C/t t 0.30 0.28 0.37 107 41 39 23 rs1305062 50097361 G/c c 0.280.26 0.31 106 38 43 19 rs34215622 50098378 —/g g 0.28 0.26 0.34 110 3840 21 rs34878901 50094317 C/t t 0.26 0.25 0.28 105 35 44 17 rs725962050099628 G/a a 0.30 0.27 0.37 108 40 39 23 rs741780 50096271 T/c c 0.300.28 0.37 107 41 39 23 rs760136 50095698 A/g g 0.30 0.28 0.37 107 41 3923 rs8106922 50093506 A/g g 0.28 0.26 0.31 109 39 43 19 APOE ε3/e4rs1038025 50096812 T/c c 0.28 0.25 0.38 68 28 63 33 rs1305062 50097361G/c c 0.27 0.24 0.38 69 25 64 32 rs34215622 50098378 —/g g 0.28 0.250.38 70 26 64 32 rs34878901 50094317 C/t t 0.24 0.20 0.38 69 25 61 31rs8106922 50093506 A/g g 0.28 0.25 0.38 70 25 64 32 LOAD LOAD LOADControl Control Control 95% CI 95% CI SNP ID (MM) (Mm) (mm) (MM) (Mm)(mm) OR lower upper All ApoE genotypes rs1038025 40 27 7 11 17 3 0.660.35 1.23 rs1038026 40 27 7 11 17 3 0.66 0.35 1.23 rs1160985 40 27 7 1117 3 0.65 0.34 1.19 rs1305062 43 24 7 13 17 1 0.81 0.42 1.56 rs3421562242 26 6 12 17 2 0.66 0.35 1.25 rs34878901 45 23 6 15 15 1 0.86 0.44 1.70rs7259620 41 26 7 11 17 3 0.63 0.33 1.18 rs741780 40 27 7 11 17 3 0.650.34 1.19 rs760136 40 27 7 11 17 3 0.65 0.34 1.19 rs8106922 42 25 7 1317 1 0.81 0.42 1.55 APOE ε3/e4 rs1038025 22 24 2 17 29 2 0.79 0.43 1.45rs1305062 25 21 2 17 30 1 0.72 0.39 1.35 rs34215622 24 22 2 17 30 1 0.740.40 1.38 rs34878901 25 21 2 18 29 1 0.71 0.38 1.34 rs8106922 25 21 2 1730 1 0.71 0.38 1.33

Another polymorphism that distinguished the two clades and, therefore,two groups of ApoE ε3 haplotypes, was a poly-T variant (rs10524523)located in intron 6 of TOMM40. On ApoE ε4 chromosomes, the variant wasrelatively long, with a narrow, unimodal distribution of lengths (21-30T residues, mean=26.78, s.d.=2.60, n=32), whereas on ApoE ε3chromosomes, a bimodal distribution of lengths was evident with peaks at15.17 (s.d.=0.85, n=36) and 33.15 (s.d.=2.09, n=55) T residues (FIGS. 8Ato 8C). Longer poly-T lengths (T>=27) segregated almost exclusively intoclade A, the higher risk clade, in the AS cohort (P=7.6×10⁴⁶, n=210,Fisher's exact test, two-tailed). The case/control ratio for thecategory containing the two, most common, shorter lengths (15 or 16 Tresidues) was 1.46 (95% CI=1.25-1.75), and the case/control ratio forthe longer length category (28, 29, 33 and 34 T residues) was 2.02 (95%CI=1.13-2.87). This data showed a trend to an association between thelonger rs10524523 poly-T length and AD (OR=1.38, 95% CI=0.80-2.39).

While there were only trends toward association of TOMM40 haplotypes orindividual polymorphisms with LOAD for the AS cohort, there was asignificant association between poly-T length category of rs10524523 andage of LOAD onset. This was tested using the DS cohort ofautopsy-confirmed ApoE ε3/ε4 subjects for whom there was disease onsetdata. Longer poly-T alleles (>=27 T residues) were significantlyassociated with onset of disease at a much younger age (70.5 years+/−1.2versus 77.6 years+/−2.1, P=0.02, n=34) (FIG. 5).

This polymorphism, therefore, significantly impacted age of diseaseonset for individuals who carry an ApoE ε3 allele. Three other poly-Tlength polymorphisms located in intron 6 (rs34896370, rs56290633 andrs10602329) also distinguish clades A and B, but these polymorphismswere not associated with age of disease onset. Similarly, there was norelationship between haplotypes of clade-distinguishing SNPs and age ofLOAD, or for the single SNP, rs8106922, which had been significantlyassociated with AD risk in three genome-wide association studies(Abraham et al. A genome-wide association study for late-onsetAlzheimer's disease using DNA pooling. BMC Med Genomics 1, 44 (2008);Carrasquillo et al. Genetic variation in PCDH11X is associated withsusceptibility to late-onset Alzheimer's disease. Nat Genet 41, 192-198(2009); Takei et al. Genetic association study on in and around the ApoEin late-onset Alzheimer disease in Japanese. Genomics 93, 441-448(2009)) (data not shown).

We conclude that longer poly-T tracts at rs10524523 are significantlycorrelated with earlier age of onset of LOAD. The length of this variantis relatively homogeneous, and relatively long, on ApoE ε4 chromosomes,whereas there are two categories of poly-T lengths linked to ApoE ε3.ApoE ε2 chromosomes also appear to carry variable-length poly-T repeatssimilar to ε3 chromosomes, but further investigation is needed to verifythis preliminary finding and to determine if the poly-T repeat impactsthe very late age of disease onset for carriers of ApoE ε2.

While it is possible that there are other variants that influence age ofonset of LOAD for individuals who are not homozygous for ApoE ε4, thelength of the poly-T polymorphism in TOMM40 intron 6 appears to be themost powerful genetic predictor in this linkage region and should bevalidated prospectively. These data suggest that ApoEgenotype-stratified age of onset curves (Corder et al. Gene dose ofapolipoprotein E type 4 allele and the risk of Alzheimer's disease inlate onset families. Science 261, 921-3 (1993); Li et al. Candidatesingle-nucleotide polymorphisms from a genomewide association study ofAlzheimer disease. Arch Neurol 65, 45-53 (2008)) are, in reality, setsof curves with each curve reflecting a specific interaction of linkedpolymorphisms in ApoE and TOMM40. Therefore, these data add resolutionto the prediction of age of LOAD onset, within a 5-7 year window, forindividuals over 60 years of age. The study to validate the associationof ApoE genotypes and TOMM40 haplotypes or rs10524523 with age ofdisease onset is currently being planned. This study will be aprospective, 5 year, population-based study conducted in several ethnicgroups, and will be combined with a prevention or delay of disease onsetdrug trial.

Methods

The two cohorts analyzed in this study were from the Arizona Alzheimer'sDisease Research Center (ADRC), Phoenix, Ariz. and the Duke Bryan ADRC,Durham, N.C. All subjects were of European descent. The Arizona and Dukestudies were approved by institutional review boards and appropriateinformed consent was obtained from all participants. Age and gender datafor the cases and controls in each cohort are shown in Table 2. For theDuke cohort, the age of disease onset was determined retrospectively anddisease diagnosis was confirmed by autopsy.

Samples were plated on 96 well plates for long-range PCR and DNAsequencing at Polymorphic DNA Technologies (Alameda, Calif.).

Long-range PCR was performed using Takara La. Taq Polymerase (TakaraMirus Bio). The reaction mix and PCR conditions were the same as thoserecommended by the manufacturer. PCR was conducted in a 50 μL volumewith 2.5 U of LA Taq and 200-400 ng human genomic DNA. Thermocycling wascarried out with the following conditions: 94° C., 1 min for 1 cycle;94° C., 30 sec; 57° C., 30 sec; 68° C., 9 min for 14 cycles; 94° C., 30sec; 5TC, 30 sec; 68° C., 9 min+15 sec/cycle for 16 cycles; 72° C., 10min for 1 cycle. Primers for long-range PCR are shown in Table 4.

TABLE 4Forward and reverse sequencing primers are listed. The shaded row indicates the forward and reverse primers used for long-range PCR of R2 (FIG. 2)Forward Primers Reverse Primers Primer  Primer  Position in Position inUCSC Primer Cloned PCR UCSC Primer Cloned PCR Coordinate Product SEQCoordinate Product SEQ (of 3′-end  (of 3′-end  ID (of 3′-end (of 3′-end  ID Sequence of primer) of primer) NO: Sequence of primer)of primer) NO: AACTCAGAGGCCAGAGATTC 50,092,429 25 1AACAGCCTAATCCCAGCACAT 50,101,560 9,156 2 TAAGT TTAC CAGGAAACAGCTATGAC50,092,292 −112 3 CCCACTGGTTGTTGA 50,093,034 630 4 GTGTGATGGTGATTCAAC50,093,038 634 5 GAATAGGGGCCTTTCA 50,093,282 878 6 CTGCAGGTATGAAAG50,093,287 883 7 CAATCTCCTAGGGTGC 50,093,512 1108 8 GTCTCTGCAGATGTG50,093,601 1197 9 CGGAAGTTGCAGTAAG 50,093,706 1302 10 TACTGCAACTTCCGC50,093,722 1318 11 AAGGTCAAGGTTACACT 50,094,318 1914 12 TCTCTGTTGCCCACG50,094,289 1885 13 ACAAGCCTAGGTGACAT 50,094,790 2386 14CCCAACTAATTTTTGTATTCG 50,094,609 2205 15 CCTGTAATCCCAGCTAT 50,095,0022598 16 ACATTTGTGGCCTGTAC 50,095,129 2725 17 TCATCTCTCTGTGAACCTAA50,095,324 2920 18 CCACATGGGCTTGTGT 50,095,603 3199 19GGCAAAATGACGATCAGT 50,095,804 3400 20 CCCAGATGCCCAAATC 50,096,082 367821 GCAGCACCAGCTAGT 50,096,218 3814 22 AACTCTGAGTGGATGTG 50,096,471 406723 GATGGTCTCAATCTCCTTA 50,096,620 4216 24 CTATAGTCCCAACTACTGA 50,096,7304326 25 TTTTTTCCAAGCATAAAACA 50,096,863 4459 26 TAGTA AGTCCCCGCTACTTA50,097,080 4676 27 GGGGATGGACAAAGCT 50,097,268 4864 28 ACCACAGGTGTATGCC50,097,451 5047 29 TGAAAAGCCCTCTAGAC 50,097,898 5494 30GAACAGATTCATCCGCA 50,097,864 5460 31 CACCCACGATCCAGTT 50,098,141 5737 32TGTGGATAGCAACTGGAT 50,098,148 5744 33 CAAAGCCACACTGAAACTT 50,098,2315827 34 GGGATTCTGAGTAGCA 50,098,469 6065 35 CAGAATCCTGCGT 50,098,5266122 36 TGCTGCCTTAAGTCCG 50,098,937 6533 37 ACACTTGAGAAAACGG 50,098,7976393 38 CTGGGGTCAGCTGAT 50,099,350 6946 39 ACAAAGTCCTCTATAGCC 50,099,0776673 40 TGAAACATCTGGGATTTATAAC 50,099,679 7275 41 TAACCTGGGGTTGGTT50,099,429 7025 42 CTGGAAACCACAATACC 50,099,990 7586 43AAGTTCCTTTGCTCATCAG 50,099,829 7425 44 ATCTCGGCTCACTGTA 50,100,261 785745 GCAAGAGGGAGACTGT 50,100,207 7803 46 GTCAAAAGACCTCTATGC 50,100,7398335 47 TGTGCCTGGATGAATGTA 50,100,567 8163 48 AGGACTCCACGAGT 50,101,1978793 49 TGAGCTCATCCCCGT 50,100,960 8556 50 CCGTGTTCCATTTATGAG 50,101,3288924 51 GTAAAACGACGGCCAG 50,101,681 9277 52

PCR products were run on a 0.8% agarose gel, visualized by crystalviolet dye, compared to size standards, cut out of the gel, andextracted with purification materials included with the TOPO XL PCRCloning kit (Invitrogen). Long-range PCR products were cloned into aTOPO XL PCR cloning vector. This system uses a TA cloning vector and isrecommended for inserts of up to 10 kb. Per the manufacturer'sinstructions, electro-competent cells (from the same kit) weretransformed by the vector, plated in the presence of antibiotic, andincubated. Ten clones from each plate were picked and cultured in a96-well format.

Diluted cultures were transferred to a denaturing buffer that was partof the TempliPhi DNA Sequencing Template Amplification kit (GEHealthCare/Amersham Biosciences). This buffer causes the release ofplasmid DNA but not bacterial DNA. Cultures were heated, cooled, spun,and transferred to fresh plates containing the TempliPhi enzyme andother components. This mixture was incubated at 30° C. for 18 hours topromote amplification of the plasmid templates. These products were thenspun and heated to 65° C. to destroy the enzyme.

Plasmid templates were used in DNA sequencing reactions using the BigDye, version 3.1 sequencing kit (Applied Biosystems). For each reaction,an appropriate sequencing primer (Table 4) was used that was designed toanneal to a unique location of the template. Cycle sequencing wascarried out with an annealing temperature of 50° C., an elongationtemperature of 60° C., and a denaturation temperature of 96° C., for atotal of 30 cycles. Sequencing reaction products were run on an ABI3730XL DNA sequencer with a 50 cm capillary array using standard runmode.

A proprietary sequencing analysis program called ‘Agent’ (developed byCelera) was used to align sequencing reads to the appropriate referencesequence, and produce ‘contigs’ associated with each clone. The systemprovides estimated quality scores for all bases for which there is anyvariation for any of the samples. The sequencing report for each samplewas analyzed for the presence of SNPs that were correlated in onehaplotype pattern for one subset of clones and in a different haplotypepattern for the remaining clones. A reference file for the region ofinterest was prepared by listing the known variations for that regionpublicly available from NCBI dbSNP. A genotype file for the region ofinterest was created by searching each subject's haplotype report forall variations between the known reference sequence and the consensushaplotype sequences.

The magnitude of the length-reading error for the poly-T variants (e.g.,rs10524523) was estimated by examining the observed lengths from the 10clones that were prepared for samples that had a single haplotype. For atypical sample with short poly-T length of 16, the standard deviationfor the 10 clones was 0.97. For a typical sample with longer poly-Tlength, e.g., 27, the standard deviation was 1.58.

Phylogenetic analysis was conducted. A multiple sequence alignment ofthe sequences was performed using the ClustalW2 (version 2.0.10) programusing default parameters. Manual adjustment of the alignments wascompleted using Genedoc (version 2.7.000). Phylogenetic trees wereconstructed using Bayesian, maximum likelihood and distance-basedreconstructions. The phylogenetic tree construction software used wasPaup* (version 4.0b10), ClustalX2 (neighbor-joining methods, version2.0.10) and Mr. Bayes (version 3.1.2).

Tree-bisection and reconnection branch swapping were used in allmethods. The best fitting model of sequence evolution was estimatedusing the Modeltest program (version 3.7) which provided estimates forthe following key determinants: rate matrix, shape of the gammadistribution and proportion of invariant sites. Bootstrap analysis wasperformed using 1000 replicates to determine statistical support forspecific tree morphology.

Haplotype networks were also constructed from the sequence data usingthe program TCS (version 1.21 (Clement et al. TCS: a computer program toestimate gene genealogies. Mol Ecol 9, 1657-9 (2000))) to compare thephylogenetic trees to cladograms estimated using statistical parsimony.The phylogenetic trees and haplotype networks were constructed twice,with gaps treated as missing data for the first instance and as a fifthcharacter for the second instance. Nucleotide diversity in the region ofinterest was calculated using DnaSP (version 5.00.02 (Librado et al.DnaSP v5: a software for comprehensive analysis of DNA polymorphismdata. Bioinformatics 25, 1451-2 (2009))).

After construction of the phylogenetic trees, the haplotype network, andcompletion of the analysis of nucleotide diversity in the region ofinterest, the results from the different methods were compared andreconciled to a consensus tree. Groups of sequences sharing a recentdisease mutation were presumed to segregate more closely on thephylogenetic tree, however, sporadic cases due to phenocopies, dominanceand epistasis can introduce noise into the phenotype-haplotyperelationship (Tachmazidou et al. Genetic association mapping viaevolution-based clustering of haplotypes. PLoS Genet 3, el 11 (2007)).

However, sporadic cases due to phenocopies, dominance and epistasis canintroduce noise into the phenotype-haplotype relationship. Thisphylogenetic analysis focused on a high-level aggregation of clades inorder to minimize these effects. The clades determined at the firstsplit in the phylogenetic tree were used to test the hypothesis thatTOMM40 subject-haplotypes from clade ‘B’ were associated with onset ofAD at a later age than subject-haplotypes from clade ‘A’, (each subjectcontributed two haplotypes to the AD age of onset association signal).The number of tests of association that are performed using thisapproach was orders of magnitude less than in typical genome-wideassociation studies since the phylogenetic analysis identifiedcategories of evolutionarily-related subject-haplotypes. If the tests ofassociation confirmed that the different clades classified thesubject-haplotype data by age of onset, further statistical analysis wasdone to identify the variants that separated the sequences into eachclade. Effectively, this analysis assessed the significance of eachvariant as a factor that influences age of onset using a series ofone-degree of freedom tests guided by the tree structure. Thephylogenetic analyses were conducted using single nucleotide andinsertion/deletion polymorphisms. The statistical tests of associationwere adjusted with a Bonferroni correction for the number of polymorphicsites included in the analysis.

Haplotype reports from the Polymorphic analysis software and reportsfrom DnaSP software (version 5.00.02 (Librado et al. DnaSP v5: asoftware for comprehensive analysis of DNA polymorphism data.Bioinformatics 25, 1451-2 (2009))) were used for subsequent statisticalanalyses. We analyzed individual TOMM40 SNP variants, TOMM40 haplotypesand length of poly-T repeats for association with LOAD risk for the AScohort and LOAD age of onset for the DS cohort. Differences in theproportions of specific TOMM40 alleles associated with each ApoE alleleor ApoE genotype were compared using Fisher's exact test (two-tailed).Starting with 30 parsimony-informative sites and a=0.05, a Bonferronicorrection for the significance of a specific allelic association wouldrequire a P value of 0.001. Odds ratios (OR) were calculated as the(number of minor alleles in cases/number of minor alleles incontrols)/(number of major alleles in cases/number of major alleles incontrols) and reported with 95% confidence interval. Means for definedLOAD age of onset groups were compared by t tests, two-tailed. Astandard F test on group variances was performed to determine whetherthe t test was calculated assuming equal or unequal variances.Statistical analysis was completed using JMP software (version 8, SASInstitute, Cary, N.C.).

Accession Codes: GenBank: TOMM40, translocase of outer mitochondrialmembrane 40 homolog, 10452; ApoE, apolipoprotein E, 348

The foregoing is illustrative of the present invention, and is not to beconstrued as limiting thereof. The invention is defined by the followingclaims, with equivalents of the claims to be included therein.

That which is claimed is:
 1. A method of determining increased risk fordevelopment of Alzheimer's disease in a subject, comprising: (a)detecting from a biological sample containing DNA taken from saidsubject the presence or absence of a genetic variant of the TOMM40 geneassociated with increased or decreased risk of Alzheimer's disease,wherein said variant is a deletion/insertion polymorphism (DIP) inintron 6 or intron 9 of the TOMM40 gene; and (b) determining saidsubject is at increased or decreased risk of Alzheimer's disease whensaid genetic variant is present or absent.
 2. The method of claim 1,wherein said detecting comprises PCR amplification and/or DNAsequencing.
 3. The method of claim 1, further comprising detecting anApo E genotype of the subject, and wherein said determining said subjectis at increased or decreased risk of Alzheimer's disease is furtherbased upon the Apo E genotype.
 4. The method of claim 1, furthercomprising the step of: (c) administering an anti-Alzheimer's diseaseactive agent to said subject in a treatment effective amount when saidsubject is determined to be at increased risk of Alzheimer's disease. 5.The method of claim 4, wherein said administering step is carried out insaid subject at an earlier age when said subject is determined to be atincreased risk by the presence or absence of said genetic variant ascompared to a subject in which said genetic variant is not present orabsent.
 6. The method of claim 4, wherein said active agent is selectedfrom the group consisting of acetylcholinesterase inhibitors, NMDAreceptor antagonists, peroxisome proliferator-activated receptoragonists or modulators, antibodies, fusion proteins, therapeutic RNAmolecules, and combinations thereof.
 7. The method of claim 4, whereinsaid active agent is a peroxisome proliferator-activated receptoragonist or modulator.
 8. The method of claim 4, wherein said activeagent is a thiazolidinedione.
 9. The method of claim 1, wherein saidgenetic variant of the TOMM40 is a poly-T DIP length at rs10524523. 10.A method of treating a subject for Alzheimer's disease comprising:administering an anti-Alzheimer's disease active agent to said subjectin a treatment-effective amount, said administering carried out at anearlier age when said subject carries a genetic variant of the TOMM40gene associated with increased risk of Alzheimer's disease as comparedto a corresponding subject who does not carry said genetic variant,wherein said genetic variant of the TOMM40 is a deletion/insertionpolymorphism (DIP) in intron 6 or intron 9 of the TOMM40 gene.
 11. Themethod of claim 10, wherein said active agent is selected from the groupconsisting of acetylcholinesterase inhibitors, NMDA receptorantagonists, peroxisome proliferator-activated receptor agonists ormodulators, antibodies, fusion proteins, therapeutic RNA molecules, andcombinations thereof.
 12. The method of claim 10, wherein said activeagent is a peroxisome proliferator-activated receptor agonist ormodulator.
 13. The method of claim 10, wherein said active agent is athiazolidinedione.
 14. The method of claim 10, wherein said geneticvariant of the TOMM40 gene is a poly-T DIP length at rs10524523.
 15. Akit comprising: (A) at least one reagent to specifically detect a poly-Tlength at rs10524523 of the TOMM40 gene from a biological samplecontaining DNA from a human subject; (B) buffers, enzymes and reagentsfor amplifying the DNA via primer-directed amplification; and (C)optionally, instructions for use in amplifying a region of the TOMM40gene comprising the poly-T at rs10524523.
 16. The kit of claim 15,wherein the at least one reagent comprises a nucleic acid for theprimer-directed amplification.
 17. The kit of claim 15, wherein said kitfurther comprises: (D) at least one reagent to specifically detect anApoE 3, ApoE 4, or ApoE 2 allele from the biological sample.
 18. The kitof claim 17, wherein the at least one reagent to specifically detect anApoE 3, ApoE 4, or ApoE 2 allele comprises a nucleic acid for aprimer-directed amplification of a region of the ApoE isoform.
 19. Thekit of claim 17, wherein the at least one reagent to specifically detectan ApoE 3, ApoE 4, or ApoE 2 allele comprises an antibody thatselectively binds an ApoE isoform, or an oligonucleotide probe thatselectively binds to DNA encoding an ApoE isoform.
 20. The kit of claim19, wherein the antibody or oligonucleotide probe is labeled with adetectable group.